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Trapped 1n a bear hug 


Continuing uncertainty in Russia is bad for science and the wider world — and Putin’s stifling control 
of key appointments and private initiatives is unlikely to reverse decades of post-Soviet decline. 


s the United States goes to the polls after an unprecedentedly 

vulgar election campaign, its former cold-war rival is exuding 

strength and confidence. Just last week, Russia cited its proud 
history of exploration and science in Antarctica when giving the green 
light to an international agreement to create a vast marine reserve in the 
Ross Sea (see page 13). Even so, Vladimir Putin’s chauvinistic approach 
to politics has a certain appeal to some — at home and abroad. 

The Russian leader's perplexing popularity in some circles detracts 
from his enduring failure to modernize his country’s government, 
society and economy. Russias flagging science system is in dire need 
ofa cure, too. But the recent appointment of Olga Vasilyeva as Putin's 
science and education minister casts strong doubts on whether the 
right healers are at work. Vasilyeva, an ultra-conservative historian, 
is mainly known for her affinity with the Russian Orthodox Church 
and her ambivalence towards Stalin. 

Russian science is still struggling to recover from decades of neglect 
and post-Soviet degradation. Its research community is isolated. 
Foreign students and scientists are a rare commodity at Russian 
universities and research institutes. A generously funded government 
scheme to attract top foreign researchers to Russian labs is hampered 
by red tape: Westerners attempting collaborative science in Russia 
often complain that the security services and customs authorities 
interfere with civil research — harassments that many Russian scien- 
tists have been quietly enduring for many years. 

According to official reading, science is held in high esteem. Putin 
and the clique of allies he has placed in key positions in industry and 
administration like to think of Russia as a technological powerhouse. 
But government programmes such as RUSNANO, a multibillion- 
rouble nanotechnology initiative launched in 2007, have by all accounts 
delivered little in terms of innovation. The Skolkovo Innovation 
Centre outside Moscow, hailed as a Russian Silicon Valley, grapples 
with allegations of embezzlement against members of its management 
body. Private initiatives and philanthropic ventures, meanwhile, are 
feeling the effects of jealous state control: the Moscow-based Dynasty 
Foundation, a rare private science-funding body, ceased operations last 
year after it was labelled an undesired ‘foreign agent’ 

Even so, some growth in public science spending in recent years, 
together with plans to strengthen universities and streamline the over- 
sized Russian Academy of Sciences (which runs hundreds of research 
institutes), raised hopes that things might start to improve. Dmitry 
Livanov, a dynamic physicist who ran the science ministry from 2012, 
had seemed the right person to push through a series of reforms to get 
Russian science back on course. It came as a shock, therefore, when 
Putin fired him in August. 

The reasons for Livanov’s abrupt departure remain unclear, 
although they are likely to be political rather than related to the per- 
formance of the ministry he headed. His dismissal might be consid- 
ered a win for the Russian Academy of Sciences and for numerous 


low-profile universities, which he intended to close. But wins in these 
cases would not be a win for Russian science at large. 

The arrival of his successor has created fresh uncertainty. In one of 
her first moves, on 28 September, Vasilyeva announced her intention 
to suspend planned university mergers and expressed doubts about 
the future of a government programme to create five world-class uni- 
versities by 2020. Then last month she said 


“Putin must that Russian scholars and scientists should 
understand that  beassessed primarily on the basis of their 
isolationism publications in Russian-language academic 
leads to adead journals. Scientists were puzzled: would the 
endin both proposed system apply only to the humani- 
science and ties or to all fields of research, in which case 


it would render much of Russian science a 
footnote in global terms? Her ministry has 
failed to respond to requests for clarification from Nature. 

Livanov'’s removal and Vasilyeva’s awkward manoeuvres during 
her first months in office indicate that science and science-related 
affairs are becoming increasingly subject to Putin’s autocratic schem- 
ing. Moscow’s suspension last week of an agreement with the United 
States on cleaning up weapons-grade plutonium fits with that view. 

But no matter how tense the geopolitical climate, Putin must under- 
stand that isolationism leads to a dead end in both science and politics. 
Livanoy’s reforms should continue, and Vasilyeva must urgently pro- 
vide Russia’s anxious research community with a clear outlook. Putin 
and his ministers must also strive for more constructive international 
collaboration, in science and in other spheres. Russia cannot go it 
alone, either in science or in Syria. One can only hope that its consent 
to join efforts to protect the high seas might herald a new era. m 


politics.” 


Spots and stripes 


Deciphering the genes at play in the patterning 
of mammalian coats. 


priest, Tzinacan, imprisoned by the conquistador Pedro de 

Alvarado. In the cell next door is a jaguar: Tzinacan becomes 
convinced that the jaguar’s spots are not random blotches, but contain 
a message from his God that, could he decipher it, would offer a key to 
his escape. Any reader of Borges learns to appreciate his playful mash- 
up of fact and fiction. Of the small cast of characters in La Escritura de 
Dios (The God’ Script), de Alvarado was a real person, and Tzinacan 
probably an invention, although with Borges one can never be sure. 


ik Argentine writer Jorge Luis Borges tells the story of an Aztec 


3 NOVEMBER 2016 | VOL 539 | NATURE | 5 


© 2016 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


| THIS WEEK | EDITORIALS 


Jaguars, though, definitely exist, and — like many mammals — have 
a pattern of spots that fascinates and tantalizes. 

Understanding the origins of variegated colour patterns in mamma- 
lian fur is an abiding problem in biology. Other animals adopt a range 
of pigments, and even use optical effects such as iridescence to lenda 
chromatic gloss, yet the mammalian palette is mainly monochrome. 
A patch of skin either contains melanocytes, or it doesn't. 

This week, researchers report in Nature some progress on the 
problem with the African striped mouse, Rhabdomys pumilio 
(R. Mallarino et al. Nature http://dx.doi.org/10.1038/nature20109; 
2016). This creature has a stripe on either side of its spine, each a 
sandwich of light-coloured hair between two outriders of pure black. 
The rest of the mouse is an intermediate shade, except for a pale belly. 
The pattern starts to emerge long before a mouse is born. 

The difference is down to gene expression. The white stripes are 
enriched in transcripts of Alx3, a transcription factor, which curbs the 
activities of a gene called Mitf. If left unhindered, this gene would allow 
melanocytes to differentiate and produce dark pigment. 

As model organisms go, R. pumilio is very different from the labora- 
tory mouse. Even further removed is the Eastern chipmunk, Tamias 
striatus. Chipmunks are more closely related to squirrels than to mice: 
the last common ancestor of mouse and chipmunk lived when dinosaurs 
did. Yet the formation of chipmunk stripes is governed by essentially the 
same processes that create the patterning in mouse skin, even though the 
mechanisms might have evolved independently in each case. 

Study of the chipmunk shows other genes involved. Expression of 
one called Asip in lighter areas, another called Edn3 in darker, show 
that patterning is not down to a single genetic interaction. The work of 
Edn3 and other genes, we know, writes the script of spots and stripes 
in cats, from tabbies to cheetahs (C. B. Kaelin et al. Science 337, 
1536-1541; 2012) — and so, presumably, in the coat of the jaguar that 
Tzinacan longed to decipher. 


Much remains to be learnt. The stripes of mice and chipmunks 
don't occur in the same places on the animal, and scientists still do 
not understand why the grass mouse Lemniscomys rosalia has only 
one stripe, whereas the ground squirrel Ictidomys tridecemlineatus 
has thirteen. The God’s script comes in many dialects. 

Skin pigmentation is superficial — literally — but the genes that create 
these patterns often have other, more profound purposes. The skin and 
hair of vertebrates derives from the neural crest, an embryonic tissue 
unique to vertebrates, which, migrating from the edge of the neural 

plate as it rolls up to create the spinal cord, 


“Oddities of skin _ interacts with tissues all over the body to cre- 
pigmentation ate structures seen nowhere else in the king- 
sometimes dom of life. The neural crest sculpts not just 
betoken deeper hair, teeth and skin, but a long list of attributes, 


from the bones of the face to the nerves that 
line the intestines, parts of the heart and adre- 
nal glands, and many crucial components of our sense organs. This is 
why oddities of skin pigmentation sometimes betoken deeper ailments. 
It explains why cats that are white are more than usually likely to be deaf. 

So much is clear for Alx3. Mice deficient in this gene show a range 
of neural-tube closure defects, the incidence of which is reduced by 
folic acid (S. Lakhwani et al. Dev. Biol. 344, 869-880; 2010). This may 
explain why human mothers deficient in this vitamin run the risk of 
giving birth to babies with spina bifida. Again in humans, recessive 
mutations in ALX3 produce a series of facial malformations called 
frontorhiny, also related to failure of the facial bones to knit properly 
(S. R. FE Twigg et al. Am. J. Hum. Genet. 84, 698-705; 2009). The script 
runs deep, with many layers of meaning. 

Did Tzinacan finally decipher the God’s script? The answer is yes: 
the jaguar’s fur encoded a spell which, if recited out loud, would make 
the prison vanish. But Tzinacan chose not to use it because, in the act 
of decipherment, he became a god himself. = 


ailments.” 


Get real 


Researchers must show policymakers that 
scientific evidence is far from academic. 


to point out that when the Welsh MP Glyn Davies tweeted 

at the weekend: “Nothing more irritating than academics 
rubbishing the efforts of those operating at the sharp end, without 
facing up to the hard decisions’, he was inadvertently complaining 
that people at the sharp end (himself included presumably) do not 
confront hard decisions. 

Besides, the next social-media missive from Davies made his 
position clear: “Personally, never thought of academics as ‘experts. 
No experience of the real world.” His first point there might — just — 
be semantically defensible: academics, by one definition, are full-time 
scholars; whereas experts can be classed as those who have learned 
not through study but through experience. But it was his second 
assertion that prompted most of the angry backlash, and the inevi- 
table hashtag response #realworldacademic that was still going strong 
as Nature went to press. 

(Replies to Davies ranged from “Practiced medicine in Intensive 
Care Unit and emergency medicine while I was doing a PhD” to 
“Dude, you literally work in a palace”) 

There's no need for Nature to tell its readers — mostly academics — 
that they have experience of the real world. They live it every day; and, 
for many, the realities of this academic life are starting to bite down 
hard. As we explored in a special issue last week, the real world of 
academia for many young researchers is insecure and under increasing 


(Grr and Twitter rarely sit happily, so it would be churlish 
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pressure. Many are looking to leave. (And when they do, Davies and 
others please note, they seem to flourish.) 

The popular image of an academic as aloof, privileged and out of 
touch — if it ever was true — is now redundant. But then so is the 
popular view that backbench MPs are, well, aloof, privileged and 
out of touch. In most cases, both groups work harder, and with more 
selfless goals, than critics claim. By their nature, those who study 
the science of what is probable will come into conflict with those 
who practise the art of what is possible. But researchers, along with 
everybody else who criticizes policymakers and elected officials, 
should remember that, as Davies seemed to be trying to point out, it 
is one thing to discuss problems and recommend solutions, and quite 
another to have to make and implement decisions. 

One reason that the MP’s comments seem to have strucka nerve is 
that they feed into the popular idea — fuelled by the Brexit campaign 
and the rise of Donald Trump — that politicians, and by extension the 
wider public, have shifted away from reason and evidence. Ina recent 
World View column, Bill Colglazier, a former science adviser to the 
US government, argued that this perception could be explained by 
differing attitudes to evidence — and on this point researchers seem 
to have some common ground with Davies. 

Criticized last month for attending a lecture by the prominent 
climate sceptic Matt Ridley at the prominent climate sceptic organi- 
zation the Global Warming Policy Foundation, Davies wrote on his 
blog: “I do not think Government policy should be based on a partial 
view of science. I like to make judgements based on evidence... In 
the end, governments the world over will be guided by evidence — or 
science delivered as evidence.’ 

The conflict between Davies’ support for evidence and his Twitter 
dismissal of those who seek and provide evidence seems, in the 
real world, to make for a curious paradox. Perhaps an expert could 
look into it. = 


© 2016 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


DEREK ARMSTRONG 


WORLD VIEW .jecnsicor sen 


of anonymous strangers. Earlier this year, I got to experience 

the other side of a consent form — and was left disappointed. 
When another research group asked me to donate my own genetic mate- 
rial for their whole-genome sequencing project, I asked in exchange for 
access to my raw data — to explore, to play with and just to have on file. 
Not surprisingly, my request was refused: the status quo for biomedical 
and genetic studies is not to return individual-level data to participants. 

Istill joined the study, but the irony is not lost on me that my personal 
data will be available to thousands of scientists (including me) through 
restricted-access databases. As awareness and usefulness of this infor- 
mation increases, I fear that potential volunteers who are refused access 
to their genetic data will become less willing to donate them to science. 
The genetics-research community must therefore 
update its stance on returning personal data. 

Granted, there are well-founded reasons why 
studies dont typically return the data. Researchers 
rarely recontact participants, and doing so could 
draw resources and attention away from the pri- 
mary project goals. Research is not medicine, and 
returning data can create the misleading impres- 
sion that researchers are offering health care. His- 
torically, there has also been little reason to return 
genetic data, because volunteers couldn't access 
tools to receive, store or understand them. 

But beyond the lab, more people now want and 
expect access to all kinds of personal data, a trend 
that shows no signs of slowing. Health data are no 
exception, as evidenced by the flood of wellness and ‘mobile health’ apps 
that are now coming to market. Future generations will take for granted 
that our personal computing devices are vehicles for almost unlimited 
‘quantified self’ and self-tracking activities. 

Several online platforms exist to help people to explore their genetic 
data, developed by for-profit companies, academic groups or by self- 
taught citizen scientists. Since launching in 2011, the site openSNP has 
drawn more than 4,000 users, half of whom have uploaded genetic data. 
The DNA.LAND platform has attracted more than 32,000 contributors 
since its release last October. Launched in 2008, and therefore one of the 
earliest third-party interpretation tools, Promethease reports perform- 
ing hundreds of analyses daily. Other tools, such as GEDMatch and 
Genome Mate Pro, attract thousands of users who are eager to incor- 
porate genetic analyses into their genealogical research. 

Many scientists are suspicious and occasionally derisive of consumer 
or ‘recreational’ genomics. Although these products have their flaws, 
they underscore what citizens can and want to do with their genetic 
data. For the genetics-research community to maintain its good rela- 
tionship with volunteers, it must take these activities more seriously. 
Many current large-scale genetics-research studies rely on legacy col- 
lections, and have not had to navigate the new ‘participant-as-owner’ 


A s ahuman-genetics researcher, I analyse the DNA of thousands 


GENOMICS TOOLS 
FLOURISH 


WHEN PEOPLE ARE 
STEWARDS OF THEIR 


OWN GENETIC 
DATA. 


* Geneticists should offer 
data to participants 


Sarah Nelson was refused access to her own genome data. How long before 
volunteers who face this attitude turn away from science ? 


culture. But legacy studies cannot fuel future research indefinitely. 

The genetics-research community needs to develop an anticipatory 
infrastructure to return raw data to interested participants. The for- 
mat for returning genotype data would probably vary according to the 
nature of the study — depending, for instance, on whether next-gen- 
eration sequencing or microarray genotyping was performed. Fund- 
ing opportunities should include resources for participant data return. 
Institutional review boards need to be able to review the mechanisms 
that studies propose for returning genetic data. Research groups should 
develop and adopt informed-consent procedures so participants can 
make decisions about acquiring their raw data, including the limitations 
of self-directed interpretation and analysis. 

Making data-return practicable might require building technical sys- 
tems such as secure web interfaces. However, we 
already have secure and robust methods to share 
data within the scientific community, so perhaps 
the necessary change in culture is a bigger hurdle. 

Some research initiatives are already experi- 
menting. In early September, the New York 
Genome Center in New York City released Seeq, 
asmartphone app and research platform through 
which individuals can pay a modest fee (around 
US$50) to receive their whole-genome sequence 
and some interpreted reports (such as ancestry 
composition and microbiome profiles). In turn, 
the researchers amass genomes for their own 
research projects. 

Most of the available consumer-genomics tests 
look at a very small portion of the genome, but newer and more power- 
ful exome and whole-genome sequencing options offer more detail at a 
rapidly decreasing cost. MyGene2, a web portal created by researchers at 
the University of Washington in Seattle, enables the sharing of genetic- 
sequencing and medical data across families, physicians and researchers, 
with the goal of tackling rare genetic diseases. Genomics tools such as 
this flourish when people are stewards of their own genetic data. 

This era of big data begs big questions, including who should own 
health and research data. Legally, research participants may not own 
their biological specimens, or the data extracted from them, once these 
have been donated to scientific studies. But as researchers, dont we have 
an obligation to respect the individual autonomy of participants seeking 
their raw data? Asking for access is not the same as asking for ownership 
or control, just for a reasonable reciprocity. Let's have the conversation. 
After all, if potential participants can obtain their genetic data froma 
growing number of commercial companies, they might turn their backs 
on traditional research studies altogether. m 


Sarah Nelson is a researcher and PhD student at the Institute for 
Public Health Genetics at the University of Washington in Seattle. 
e-mail: sarahcn@uw.edu 
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Selections from the 
scientific literature 


RESEARCH HIGHLIGHTS 


Bionic plant can 
sense explosives 


By incorporating fluorescent 
carbon nanotubes into spinach 
plants, researchers have turned 
the plants into environmental 
sensors. 

Michael Strano and 
his colleagues at the 
Massachusetts Institute of 
Technology in Cambridge 
coated carbon nanotubes 
with a peptide that binds to 
nitroaromatic compounds, 
which include explosives. 
They embedded the 
nanoparticles into the leaves 
of spinach plants. When 
chemical contaminants 
are absorbed by the roots 
or leaves, they attach to 
the nanotubes, causing the 
nanotubes’ fluorescence to 
decrease by an amount that 
depends on the level of the 
compound. A small detector 
picks up the signal and relays 
it wirelessly to a smartphone. 

Living-plant sensors could 
be deployed to large, remote 
areas for chemical monitoring, 
the authors say. 
Nature Mater. http://dx.doi. 
org/10.1038/nmat4771 (2016) 


Low oxygen resets 
the body clock 


Cutting ambient oxygen levels 
helps mice to recover froma 
situation similar to jet lag. 

In mammals, circadian 
clocks synchronize 
metabolism according to the 
day-night cycle. Gad Asher 
at the Weizmann Institute of 
Science in Rehovot, Israel, and 
his colleagues found that the 
amount of oxygen in the blood 
and kidneys of rodents varies 
with the time of day. Tests in 
cultured mouse cells showed 
that rhythmic fluctuations in 
oxygen levels synchronized 


Fungi boost bacterium 


A study of 25 cheeses finds that a slow-growing bacterium can 
outcompete its relatives with the help of fungi. 

Benjamin Wolfe at Tufts University in Medford, 
Massachusetts, and his colleagues examined the relative 
abundance of Staphylococcus bacteria (three species 
pictured), which are common in cheese. They found that 
Staphylococcus equorum dominated, despite being the 
slowest grower in lab tests. In the presence of fungi of the 
genus Scopulariopsis, S. equorum lowered its expression of 
genes involved in iron uptake and metabolism. The fungi 
could be providing the bacterium with freely available iron 
needed for growth, saving S. equorum the effort of acquiring 
and processing the nutrient, and allowing it to outcompete 


other bacteria. 


Fungi could be influencing the diversity of other bacterial 
communities, including those in humans, the authors say. 


mBio 7,e01157-16 (2016) 


the circadian clock; this 
seemed to happen through 
HIF1a, a protein known to 

be an oxygen sensor. Mice 
exposed to a cycle of light and 
dark that was shifted by six 
hours to mimic jet lag adapted 
faster to the new conditions 
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when ambient oxygen levels 
were decreased either before 
or after the shift. 

Modulation of oxygen levels 
could be a future therapy for 
jet lag, say the authors. 

Cell Metab. http://doi.org/bsc9 
(2016) 


ASTRONOMY 


Small stars host 
water worlds 


Earth-sized planets covered 
in water may be abundant 
around red dwarfs, the most 
common type of star in the 
Universe. 

Yann Alibert and Willy 
Benz at the University of Bern 
used computer simulations 
to predict the properties 
of planets that could form 
around red dwarfs and host 
liquid water. They found that 
the radius of the planets would 
be 0.5-1.5 times that of Earth, 
with most being around the 
same size as Earth. More than 
90% of the simulated planets 
were at least 10% water by 
mass, suggesting that they 
were completely surrounded 
by deep oceans. 

The authors say that the 
prospects for life on such 
planets are unclear, because 
too much water could 
destabilize the climate. 

Astron. Astrophys. in the press; 
Preprint at https://arxiv.org/ 
abs/1610.03460 (2016) 


Weary T cells may 
not recover 


Exhausted immune cells bear 
distinct genetic signatures, and 
may be difficult to revive —a 
finding with implications for 
therapies that harness the 
cells. 

Immune cells called T cells 
can become ‘exhausted’ and 
dysfunctional after exposure 
to cancer or chronic infection. 
Two teams — one led by John 
Wherry at the University of 
Pennsylvania in Philadelphia, 
the other by Nir Yosef at the 
University of California, 
Berkeley, and Nicholas 
Haining at the Dana-Farber 
Cancer Institute in Boston, 
Massachusetts — looked at 


E. K. KASTMAN ET AL./MBIO (CC BY 4.0) 


SANGKHOM HUNGKHUNTHOD/GETTY 
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changes in gene expression 
and epigenetic markers 
(chemical changes to DNA 
that do not affect its sequence) 
in mice infected with a virus. 
They found that exhausted 
T cells had a characteristic 
profile that distinguished them 
from functional T cells. 

One of the teams also 
showed that exhausted 
T cells were reactivated by an 
antibody that blocks PD-L1, 
a protein that suppresses 
T-cell responses. However, 
this effect was transient when 
viral levels remained high, 
suggesting that certain kinds 
of immunotherapy may need 
to be combined with other 
treatments to yield lasting 
benefit. 
Science http://doi.org/bsdh; 
http://doi.org/bsdj (2016) 


Noise disrupts 
other senses 


Noise pollution can affect 
how wild animals respond to 
other sensory inputs, such as 
smell. 

Andrew Radford and his 
colleagues at the University 
of Bristol, UK, studied the 
behaviour of wild dwarf 
mongooses (Helogale parvula; 
pictured) that had been 
habituated to the presence of 
human observers. The team 
placed faeces from either a 
predator or a herbivore outside 
the mongoose burrow. When 
ambient natural sounds were 
played, mongooses were quick 
to inspect both types of faeces. 
In response to predator faeces, 
the animals showed increased 
vigilance and stayed close 
to the burrow. By contrast, 
when road noise was played, 
mongooses were slower to 
approach and showed similar 


responses to both predator and 
herbivore faeces. 

Noise pollution may 
distract the mongooses and 
increase stress, impairing 
the creatures’ natural anti- 
predator behaviour, the 
authors say. 

Curr. Biol. 26,R911-R912 
(2016) 


3D-printed device 
shapes ultrasound 


A specially designed lens 

can create ultrasound beams 
with the potential to precisely 
move, manipulate and destroy 
cell-sized objects. 

Ultrasound beams can be 
made by firing pulses of laser 
light at a lens to create high- 
frequency vibrations. But glass 
lenses can create only relatively 
simple wave patterns. Claus- 
Dieter Ohl and his colleagues 
at Nanyang Technological 
University in Singapore used 
a 3D printer to build polymer 
lenses in 3D curved shapes. 
These lenses generated beams 
just as powerful as those made 
from glass, but their complex 
shapes allowed greater control 
over the beams focus in space 
and time. 

This could enable complex 
manipulations of minuscule 
objects, say the authors. 

Appl. Phys. Lett. 109, 174102 
(2016) 


FLUID DYNAMICS 


Soft surfaces 
suppress splash 


Splashing occurs when 
droplets strike a stiff, flat 
surface, but a soft material, 
such as silicone gel, can reduce 
or even eliminate splatter. 

A team led by Robert Style 


RESEARCH HIGHLIGHTS MiiiSaiaa¢ 


at the Swiss Federal Institute 


of Technology in Ziirich and 
Alfonso Castrejon-Pita at the 
University of Oxford, UK, 
observed ethanol drops falling 
onto silicone gels of varying 
stiffness. Deformations of 
the soft substrates within a 
few microseconds of impact 
absorbed the drops’ kinetic 
energy, decreasing splashing. 
The authors say soft gels 
and elastic polymers could 
be used as inexpensive 
coatings to prevent splashing, 
which could improve many 
technologies, including inkjet 
printers. 
Phys. Rev. Lett. 117, 184502 
(2016) 


ECOLOGY 


River fish feed 
millions 


Total freshwater-fish 
consumption provides for 
the dietary animal-protein 
needs of the equivalent of 
158 million people, with 
poorer nations especially 
dependent on this natural and 
inexpensive source of food. 
Peter McIntyre at the 
University of Wisconsin, 
Madison, and his colleagues 
used data from the Food and 
Agriculture Organization of 
the United Nations to builda 
global map of river fisheries, 
which have historically 
received less attention than 
their marine counterparts. 
They found that pressure from 
fishing was most intense in 
areas where biodiversity was 
also highest, raising concerns 
about conservation. The 


Mekong (pictured), Amazon 
and Niger were some of the 
most heavily fished rivers, 
whereas rivers in the United 
States and Europe saw lower 
than expected catches. 
Declines in river fish 
could be catastrophic for the 
food security of hundreds 
of millions of people, the 
authors say. 
Proc. Natl Acad. Sci. USA http:// 
doi.org/bscf (2016) 


Magpies behave 
cooperatively 


A species of magpie is the 
first bird found to show 
cooperative behaviour without 
prompting. 

A team led by Lisa Horn 
at the University of Vienna 
devised apparatus that allowed 
East Asian azure-winged 
magpies (Cyanopica cyana) to 
distribute food (mealworms 
and crickets) to others and 
found that they gave out food 
relatively evenly to group 
members. The authors argue 
that their findings support 
the ‘cooperative breeding 
hypothesis. This states that 
prosocial behaviour — 
helping others at no or low 
cost — evolved in species such 
as humans, whose offspring 
are cared for by not only 
parents, but also other group 
members. 
Biol. Lett. 12, 20160649 (2016) 
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SEVEN DAYS nescwisn 


EVENTS 


Antarctic reserve 


In a diplomatic breakthrough, 
24 nations and the European 
Union agreed on 28 October 
to create the world’s largest 
marine reserve, in the 
Southern Ocean. The deal, 
which will take effect in 
December 2017, protects 1.55 
million square kilometres of 
the Ross Sea, a deep Antarctic 
bay 3,500 kilometres south 

of New Zealand, from 
commercial fishing and 
mineral exploitation. The 
agreement became possible 
because of assent from 
Russia, which had long 
blocked a deal. See page 13 
for more. 


Cholera in Haiti 

On 27 October, health 
officials announced plans 

to vaccinate 820,000 people 
against cholera in regions 

of southern Haiti that were 
devastated by Hurricane 
Matthew last month. Damage 
to water and sanitation 
infrastructure has raised the 
risk of a cholera outbreak. 

In 2010, after a devastating 
earthquake, the country 
suffered an epidemic of the 
disease that affected some 
700,000 people, killing around 
9,000. Vaccinations will begin 
on 8 November, according 


193 km 


The distance covered by the 
first commercial journey 

of a self-driving truck. 
Transport firm Otto, owned 
by Uber, delivered 51,744 
cans of Budweiser beer from 
Fort Collins, Colorado, to 
Colorado Springs. 


Source: Otto 


Italy hit by strongest quake in decades 


A magnitude-6.6 earthquake struck central Italy 
on 30 October, the most powerful such event in 
the country since 1980. The epicentre was about 
115 kilometres northeast of Rome. The town of 
Arquata del Tronto suffered heavy damage and 
Norcia’s cathedral was destroyed, but no fatalities 
were reported. Italy’s civil protection agency said 
that more than 15,000 people are in temporary 


to the Pan American Health 
Organization, which will 
support the campaign led 
by the Haitian Ministry of 
Health. 


Mosquito test 

The world’s biggest test 

yet of an unconventional 

but promising method 

to fight mosquito-borne 
diseases will commence in 
Rio de Janeiro, Brazil, and 
Medellin, Colombia, scientists 
announced on 26 October. 
Mosquitoes that carry 
Wolbachia bacteria — which 
hinder the insects’ ability to 
transmit Zika, dengue and 
other viruses — will be widely 
released in the cities over the 
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next two years, reaching an 
estimated 2.5 million people 
in each region. See page 17 for 
more. 


EU drug agency 


Ireland’s health minister, 
Simon Harris, announced on 
25 October that the country 
will formally bid to host the 
European Medicines Agency 
(EMA), which is widely 
expected to relocate from its 
base in London after Britain 
leaves the European Union. 
The EMA is responsible for 
the evaluation and approval 
of drugs marketed in the EU. 
Ireland and Spain have both 


accommodation. The quake follows a series 

of tremors last week, and many towns in the 
region had already been evacuated following the 
magnitude-6.2 earthquake that hit the same area 
on 24 August and killed nearly 300 people, many 
of them in Amatrice (pictured). Geophysicists 
have been concerned about continuing activity 
in the region’s complex system of faults. 


previously said that they would 
like to host the regulator, but 
Ireland’s announcement is 

the most formal statement of 
intent so far. Harris said that 
he believes a move to Dublin 
would allow the EMA to retain 
many of its staff. 


Cuban drug first 

A cancer research centre 

in New York will host the 
first ever US clinical trial of 

a biotechnology developed 

in Cuba, state governor 
Andrew Cuomo announced 
on 26 October. The trial 

—a collaboration between 
Havana's Center of Molecular 
Immunology (CIM) and the 
Roswell Park Cancer Institute 


MASSIMO PERCOSSI/ANSA/AP. 


NASA/JPL 


SOURCE: LIVING PLANET REPORT 2016 (WWF 
INTERNATIONAL, 2016); GO.NATURE.COM/2FMTSYX 


in Buffalo — will test the 
CIM’s therapeutic vaccine 
CIMAvax-EGF in people 
with lung cancer. The vaccine 
is already approved for use 

in at least five countries, 

and researchers at both 
institutions think that it 
could eventually be used to 
prevent lung cancer in people 
at risk. The trial is a sign of 
thawing relations between 
Cuba and the United States, 
following the announcement 
last month of a US Treasury 
policy that authorizes US 
scientists to collaborate 

more freely with their Cuban 
counterparts. 


Mars crash site 
NASA has released more- 
detailed images of the site on 
Mars where the European 
Space Agency’s Schiaparelli 
lander met its end last 
month. Images taken by 
NASA’s Mars Reconnaissance 
Orbiter on 25 October 
show three crash sites 

about 1.5 kilometres from 
each other (pictured), and 
suggest that a shallow crater 
was created by the impact. 
Initial indications suggest 
that a computing error 
during Schiaparelli’s six- 
minute landing manoeuvre 
may have caused the craft 
to believe it was at a lower 
altitude than it really was 
and to jettison its parachute 


TREND WATCH 


Wild populations of mammals, 
birds, amphibians, fish and other 


vertebrates declined by 58% 


between 1970 and 2012, according 


to the Living Planet Report 


2016, published on 27 October. 
Freshwater populations, which 


fell by 81%, are thought to be 


faring worse than terrestrial ones. 
Habitat loss is the main threat, 
with overexploitation and human- 
induced climate change also major 
culprits. If the trend continues, by 
2020 the world will have lost two- 
thirds of its vertebrate biodiversity, 


says the report. 


too early. The craft, part ofa 
European-Russian mission, 
was intended to test landing 
technology for a future Mars 
mission. 


CLIMATE CHANGE 


Sulfur phase-out 


Ships will be banned from 
using high-sulfur fuel starting 
in 2020. The 171 member 
states of the International 
Maritime Organization, 
which regulates international 
shipping, have agreed to limit 
the pollutant to 0.5% in fuels 
by 2020, compared with an 
average of 3.5% today. The 
decision, made on 27 October 
at a meeting in London, will 
reduce sulfur oxide emissions 
and, the organization hopes, 
“have a significant beneficial 
impact on the environment 
and on human health” But 
some environmental groups 
have criticized the deal for 


failing to tackle carbon 
dioxide emissions. The 
shipping industry, a growing 
source of greenhouse-gas 
emissions, is not covered by 
global climate agreements 
such as the 2015 Paris deal. 


Turkish rectors 


Under state-of-emergency 
provisions declared after an 
attempted military coup in 
July, Turkish President Recep 
Tayyip Erdogan issued a 
decree on 29 October giving 
him the power to directly 
appoint state-university 
rectors without input from 
the universities themselves. 
Previously, universities held 
elections to create shortlists of 
candidate rectors, from which 
the government selected three 
to propose to the president. 
The decree also dismisses more 
than 10,000 civil servants, 


VERTEBRATE NUMBERS IN DRASTIC DECLINE 
Earth’s vertebrate populations fell by 58% between 1970 
and 2012, with human activities much to blame. 


1.0- 


Index value (1970 = 1) 


a The Living Planet Index 
is calculated by tracking 
the abundance of 14,152 
populations of 3,706 


vertebrate species. 
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3 NOVEMBER 
China launches the 
next-generation Long 
March 5 rocket on its 
inaugural flight, from 
Hainan Island. 


8-9 NOVEMBER 
Leading researchers 
explain how their 
advances break down 
barriers in science and 
society at this year’s 
Falling Walls conference 
in Berlin. 

www. falling-walls.com 


including 1,267 university 
academics, on suspicion of 
having links with terrorist 
groups. The dismissed include 
24 scholars who signed an 
‘Academics for Peace’ petition 
in January that called for 

an end to violence between 
government forces and Kurdish 
separatists. Critics say that 
people with no connection 

to terrorist groups are being 
targeted. An earlier decree 

in September dismissed 

more than 2,300 university 
academics, including 44 who 
had signed the peace petition. 


Prion pioneer dies 


Susan Lindquist, an influential 
molecular biologist, died of 
cancer on 27 October, aged 
67. Lindquist did pioneering 
work on heat-shock proteins, 
which can fix misfolded 
proteins. She was also 
recognized for her work on 
infectious proteins, known 

as prions, in yeast that cause 
Creutzfeldt-Jakob disease and 
other neurological disorders 
in humans. She spent 23 years 
at the University of Chicago 

in Illinois. In 2001, Lindquist 
joined the Whitehead Institute 
in Cambridge, Massachusetts, 
where she served as its first 
female director. 
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The Ross Sea in the Southern Ocean is one of the least-altered ecosystems on Earth and teems with a diverse array of marine life. 


CONSERVATION 


Giant ocean reserve Is a go 


International agreement to create world’s largest marine protected area near Antarctica is 
hailed as a diplomatic breakthrough. 


BY QUIRIN SCHIERMEIER 


Russia's relationship with the rest of the 
world. After years of unsuccessful talks, 
24 nations and the European Union agreed on 
28 October to create the largest marine reserve 
on Earth, around twice the size of Texas, in the 
Southern Ocean off the coast of Antarctica. 
The international deal sets aside 1.55 mil- 
lion square kilometres of the Ross Sea, a deep 
Antarctic bay 3,500 kilometres south of New 
Zealand, from commercial fishing and mineral 


tis a milestone for ocean conservation and 


exploitation. It is the first time that countries 
have joined together to protect a major chunk 
of the high seas — the areas of ocean that are 
largely unregulated because they do not fall 
under the jurisdiction of any one nation. The 
deal takes effect in December 2017. 

Signed by members of the Commission for 
the Conservation of Antarctic Marine Living 
Resources (CCAMLR) amid cheering at a 
meeting in Hobart, Australia, the deal became 
possible because of assent from Russia, which 
had long blocked the agreement. “Russian sup- 
port of any agreement is a very positive signal 


in the current political situation,’ says Peter 
Jones, a specialist in marine environmental 
governance at University College London. 
Scientists hope now to see an acceleration 
of efforts to protect marine areas, in par- 
ticular other ecologically precious regions 
around Antarctica. The designated reserve 
is a “first dent into the notion that we can't 
do anything to protect the high seas’, says 
Daniel Pauly, a marine biologist at the 
University of British Columbia in Vancou- 
ver, Canada, who has long sounded the 
alarm over the state of the world’s oceans. > 
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SOURCE: ANTARCTIC OCEAN ALLIANCE 


| NEWS IN FOCUS 


> CCAMLR members had discussed the Ross 
Sea reserve since the United States and New 
Zealand proposed it in 2012. Observers say 
that Russia's change of heart might have been 
the result of behind-the-scene discussions on 
the issue in recent months between the US 
secretary of state, John Kerry, and his Russian 
counterpart, Sergey Lavrov. 

The Ross Sea is relatively healthy, but fishing 
activity is increasing — and that has begun to 
affect stocks of the predatory Antarctic tooth- 
fish (Dissostichus mawsoni). Also in decline 
is the Antarctic krill (Euphausia superba), a 
shrimp-like crustacean and a key creature in 
the marine food web off Antarctica. 

The deal includes some compromises. These 
might have been necessary to win the support 
of Russia, which operates a large fishing fleet 
in the region, says Jones. 

Most of the reserve — 1.2 million km? — 
will be closed to all commercial marine activi- 
ties. But a further 322,000-km? Krill Research 
Zone will allow controlled fishing, known as 
“research fishing”, and another 110,000 km? 
will be a Special Research Zone open for lim- 
ited fishing of both krill and toothfish (‘see 
‘Safeguarding the sea’). So although the total 
area of the marine reserve is bigger than the 
next-largest — Papahanaumokuakea Marine 
National Monument near Hawaii — the region 
that is completely restricted is slightly smaller. 

And for now, a ‘sunset clause’ specifies that 
the designated zone will expire in 35 years, 
meaning that it does not fully qualify as a marine 
protected area (MPA) under the strict rules set 
by the International Union for Conservation of 
Nature. “We do regret this,” says Mike Walker, 
project director of the Antarctic Ocean Alliance, 
an environmental group in Washington DC. 


SAFEGUARDING THE SEA 

The newly created marine reserve in 
the Ross Sea near Antarctica includes 
different levels of protection. 


Antarctica 


3 


“But we are confident that decision-makers will 
come to realize that the best way to conserve the 
ocean is to protect it forever.” 


SCIENTIFIC PRAISE 
On the whole, scientists reacted enthusiasti- 
cally to the decision. The Ross Sea contains one 
of the least-altered ecosystems on Earth, says 
Kirsten Grorud-Colvert, a marine biologist at 
Oregon State University in Corvallis. But that 
ecosystem is vulnerable to human disturbance 
and the effects of climate change. “Setting aside 
an area free from fishing stresses in this marine 
reserve provides a reference point and a place 
for research to evaluate how systems respond 
to climate change, and to learn how to foster 
resilience,’ she says. 

Pauly adds, “It means we will protect one 
of the last parts of the world with a function- 
ing natural ecosystem, with a complete array 


Krill Research Zone 
allows controlled 
“research fishing” for krill 
322,000 km? 


a 


Fully protected zone 
1,117,000 km? 


Southern 


4 Ocean 
Special Research Zone 


allows limited “research 
fishing” for krill and toothfish 
110,000 km? 


Total marine protected area 
1,550,000 km? 


of marine mammals, seabirds and other 
marine life” 

But others caution that ocean protection 
zones alone will not stop the decline in marine 
biodiversity, and that they do not provide a 
solution to overfishing because they may just 
move fishing to another spot. “If fishing is the 
problem, then they should reduce fishing pres- 
sure, not move it around,” says Ray Hilborn, a 
fisheries specialist at the University of Wash- 
ington in Seattle. “Indeed, MPA might also 
stand for ‘Move Problems Elsewhere” 

Next year, the CCAMLR will discuss further 
proposals to create protected zones of roughly 
similar size off the coast of East Antarctica 
and in the Weddell Sea. Chile and Argentina, 
meanwhile, are working on a proposal to pro- 
tect the high seas surrounding the Antarctic 
Peninsula, the most rapidly warming part of 
the frozen continent. m 


Young scientists gamble on 
biotech start-ups 


Many are founding their own firms as venture capitalists show increased interest in science. 


BY ERIKA CHECK HAYDEN 


‘ J indication was three years coming for 
Ethan Perlstein. On 19 October, his 
California biotechnology company, 
Perlara, announced a deal with Novartis. The 
Swiss drug giant will test a compound that 
Perlara has identified as a possible treatment 
for a rare childhood disease, and will invest an 
undisclosed sum in the smaller firm. 
Numerous biotech investors turned 
Perlstein away before he started Perlara in 


San Francisco in 2014, because he wasn't the 
tenured professor that most venture capital- 
ists saw as founder material. “They pretty 
much told me to take a hike,” he recalls. 

But he persevered, and is now part of the 
vanguard of young biomedical scientists who 
have started companies instead of taking the 
conventional academic path and pursuing 
postdoctoral studies after their PhDs. Among 
the factors driving this change are an infusion 
of money into early-stage biotech investing, 
the emergence of biotech incubators and the 
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scarcity of academic jobs in science. 

“We're starting to see a renaissance of 
investors embracing the idea that scientists 
can build businesses,” says Ryan Bethencourt, 
programme director of IndieBio, a biotech 
accelerator in San Francisco that began in 2014. 

Previously, Bethencourt says, investors 
preferred to fund companies started by 
established professors who focused on the 
science, while investors installed a manage- 
ment team to take care of the business side. 
But that has changed as crucial technologies, 


such as genetic sequencing, have become 
cheaper and lab work has become automated. 
The cost of starting biotech companies is fall- 
ing, lowering the risk for investors to fund 
new science-based companies. IndieBio and 
Y Combinator — an information-technology 
incubator in Mountain View, California, that 
started accepting biotech companies in 2014 
— provide funding and mentoring to entrepre- 
neurs in exchange for shares in the companies. 


FORK IN THE ROAD 
Y Combinator, which provides US$120,000 
in seed funding per company, invested in 
Perlara this year; IndieBio, which provides 
$250,000 per start-up, has funded 42 compa- 
nies in a variety of fields. Last year, biotech 
firms in the United States and Europe raised 
$3.5 billion in early-stage financing — more 
than in any previous year, according to the 
consultancy Ernst & Young. Much of this was 
from investors who have already made money 
in technology. 

“Most of the venture guys I know want to 
change the world for the better,” says Dan 
Widmaier, co-founder and chief executive 


of Bolt Threads in Emeryville, California, 
which uses genetic engineering to manufac- 
ture textiles. Widmaier went to work for the 
company three days after completing his PhD 
in 2010. “As they see it, being able to serve up 
an ad faster probably isn’t changing the world 
for the better as much as being able to solve 

climate change or cure disease.” 
Conventional academic paths are also 
becoming less appeal- 


“Most of the ing. On average, young 
venture guys scientists earn their first 
I know what US National Institutes of 
tochangethe Health RO1 grant — the 
worldforthe — bread-and-butter sup- 
better.” port for most biomedi- 


cal scientists — at the 
age of 42. When Anitha Jayaprakash earned 
her genetics PhD from the Icahn School of 
Medicine at Mount Sinai in New York City in 
2014, she saw scientists all around her stuck 
in postdocs. Many had no hope of finding 
their own tenure-track academic jobs — a 
phenomenon that Perlstein has dubbed the 
“postdocalypse”. “It gave me a very depress- 
ing feeling about the whole academic space,” 
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says Jayaprakash. So she started Girihlet, a 
genetic-sequencing company in Berkeley, 
California, that has received funding from 
IndieBio and other investors. 

Alexander Lorestani felt the same way 
when he left a joint graduate and medical- 
degree programme in 2015 to co-found 
Geltor in San Leandro, California, which 
makes a vegan alternative to animal gelatin. 
He and his co-founder are 29 and 30 years old, 
and felt ready to use science to serve human- 
ity. “I couldn't imagine waiting another five to 
ten years to dive into doing what I think of as 
my life’s work,” Lorestani says. 

It’s not an easy road. Most young bio- 
tech firms fail. Widmaier says that he never 
expected Bolt Threads to raise $90 mil- 
lion and last for 6 years. He says it has been 
rewarding to thrive long enough to be doing 
groundbreaking science — and to havea rare 
degree of independence. “Anywhere else, you 
join someone else’s vision for what a perfect 
workplace is,’ he says. “The most valuable 
thing about building a company is that you 
get to build the place where you go to work 
every day.’ m 


HELIOPHYSICS 


Hiccups for US satellite 


Cosmic rays may be inducing glitches in space-weather probe’s computer. 


BY ALEXANDRA WITZE 


space-weather satellite that is supposed 
Ae alert Earth to incoming solar storms 

has temporarily dropped offline six 
times in the year since it became operational. 
The US craft’s onboard computer may be 
experiencing hiccups caused unexpectedly by 
Galactic cosmic rays. 

The Deep Space Climate Observatory 
(DSCOVR) went out of action most recently 
on 30 October. In each case, it unexpect- 
edly entered a ‘safe hold} in which scientific 
data stopped flowing and engineers had to 
scramble to try to recover the spacecraft. In 
total, DSCOVR’s space-weather forecasting 
instruments have been offline for more than 
42 hours since 28 October 2015, when the US 
National Oceanic and Atmospheric Adminis- 
tration (NOAA) took the spacecraft over from 
NASA, which built and launched it. 

Each outage lasts for only a few hours, 
and the total downtime amounts to more 
than 0.5% of its time in space — well within 
NOAASs requirement that the spacecraft oper- 
ate at least 96% of the time. The 11 October 
outage did not significantly affect predictions 
of a minor geomagnetic storm that arrived a 


i 


Workers test solar arrays on the DSCOVR satellite, which is now used to monitor space weather. 


few days later, says Robert Rutledge, head of 
the forecast office at NOAA's Space Weather 
Prediction Center in Boulder, Colorado. 

But the outages mean that DSCOVR could 
be offline when a major solar storm erupts, 
leaving Earth essentially blind to the incoming 


— lh 


onslaught. “Are they problematic? Yes,” says 
Douglas Biesecker, a solar physicist at the 
Boulder centre. 

Other heliophysics spacecraft monitor 
solar eruptions, but DSCOVR delivers 
unique information from its location at > 
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> the gravitationally stable L1 point, 
about 1.5 million kilometres from Earth 
in the direction of the Sun. The space- 
craft’s instruments measure the speed, 
magnetic field and other properties of the 
charged particles streaming off the Sun. 
Those data translate into better forecasts 
of what could happen when a solar storm 
hits Earth, such as disruptions to satellite 
electronics or fluctuations in electrical 
power grids. 

DSCOVR is NOAA’s main tool for 
forecasting space weather, but it began 
life as Triana, a NASA Earth-observing 
spacecraft built in the late 1990s to gaze 
constantly at the planet. A pet project of 
Al Gore, then the US vice-president, Triana 
was shelved in 2001, then repurposed in 
2008 for space-weather needs. “It was 
never designed from the beginning to bea 
space-weather satellite,” says Steven Clarke, 
head of NASAs heliophysics division in 
Washington DC. 

The satellite launched on 11 February 
2015 and experienced its first outage four 
months later, when its onboard computer 
spontaneously rebooted. On average, 
the safe holds happen every 74 days, but 
two came just 8 days apart. They are not 
correlated with solar storms. 

A NASA internal review board 
convened to study the problem could 
not definitively pinpoint the cause, but 
concluded that it was most likely to be 
Galactic cosmic rays randomly striking 
the spacecraft, causing high-energy ion- 
ization that reboots the computer. The 
computer, which was built by NASA in 
2000, contains a processor card that is 
similar to those flying aboard many other 
missions and is meant to withstand the 
radiation hazards of deep space. 

NOAA does have a back-up data 
stream, from the Advanced Composi- 
tion Explorer (ACE) spacecraft that is 
also orbiting the L1 point. That was the 
primary source of solar-wind data until 
NOAA forecasters switched to DS>COVR 
in July. But ACE is 19 years old, and 
intense solar storms can swamp its fore- 
casting instruments. 

NOAA has requested an extra 
US$1.5 million from Congress to improve 
how it handles DSCOVR data, including 
its responses to the outages. The satel- 
lite is supposed to last until 2022, when 
a follow-up mission is slated for launch. 
Historically, NOAA has cobbled together 
its space-weather observations where 
and when it could, but the US govern- 
ment is starting to demand a more coher- 
ent approach. On 13 October, President 
Barack Obama signed an executive order 
that, among other things, requires NOAA 
to “ensure the continuous improvement 
of operational space weather services”. m 
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Crop yields and water efficiency could be improved with the use of better gene-editing techniques. 


A better way to 
hack plant DNA 


As gene editing opens doors, crop researchers are 
hamstrung by the need for more-modern tools. 


BY HEIDI LEDFORD 


hen crop engineers from around 
the world gathered in London in 
late October, their research goals 


were ambitious: to make rice that uses water 
more efficiently, cereals that need less fertilizer 
and uberproductive cassava powered by turbo- 
charged photosynthesis. 

The 150 attendees of the Crop Engineer- 
ing Consortium Workshop were awash with 
ideas and brimming with molecular gadgets. 
Thanks to advances in synthetic biology and 
automation, several projects boasted more 
than 1,000 engineered genes and other molec- 
ular tools, ready to test in a researcher's crop of 
choice. But that is where they often hit a wall. 
Outdated methods for generating plants with 
customized genomes — a process called trans- 
formation — are cumbersome, unreliable and 
time-consuming. 

Asked what hurdles remain for the field, 
plant developmental biologist Giles Oldroyd 
of the John Innes Centre in Norwich, UK, had 
a ready answer: “The big thing would be to 
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improve plant transformation,” he said. 

“What we're all facing is this delivery prob- 
lem,” says Dan Voytas, a plant biologist at the 
University of Minnesota in Saint Paul. “We 
have powerful reagents, but how do you get 
them into the cells?” 

At issue is the decades-old problem that it 
is difficult to modify plant genomes and then 
regenerate a whole plant from a few trans- 
formed cells. Genome-editing techniques 
such as CRISPR-Cas9 hold out the promise 
of sophisticated crop engineering that would 
once have been unthinkable — making it all 
the more frustrating when researchers run up 
against an old roadblock. 

On 28 September, the US National Science 
Foundation (NSF) recognized this frustration 
by announcing that it would fund research 
into better transformation methods. That 
focus is one of four in a new plant-genome 
research programme that will receive a total 
of US$15 million. 

“Everybody agrees that it really is the bottle- 
neck for genome engineering,’ says Neal 
Stewart, a plant biologist at the University of 
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Tennessee, Knoxville, who co-organized an 
NSF workshop about plant transformation last 
November. “And I think there’s enough interest 
now in trying to come up with ways to fix the 
problem for major crops.” 


OBSTINATE CROPS 

Some plants, such as the diminutive thale cress 
(Arabidopsis thaliana), the ‘lab rat’ of plants, 
are easily transformed using a bacterium that 
can add genes to plant genomes. Research- 
ers insert the genes they want to test into the 
bacterium (Agrobacterium tumefaciens), and 
then coax the microbe to infect the reproduc- 
tive cells of the plant. When the plant then 
produces offspring, some of them express the 
new genes. 

But this does not work for many crops, and 
use of Agrobacterium triggers extra scrutiny 
from government agencies such as the US 
Department of Agriculture because it is con- 
sidered a plant pest. As an alternative, research- 
ers can use ‘gene guns’ that fire DNA-coated 
gold beads into plant cells. Those cells are then 
bathed in growth hormones and coaxed to 
regenerate a full plant. Some plants, such as 
maize (corn), readily bend to this treatment. 
Others, such as wheat and sorghum, do not. 


For recalcitrant crops, it can take months of 
painstaking cell-culture work — optimizing 
growth conditions and hormone concentra- 
tions — to regenerate the full plant. The con- 
ditions needed for success vary not only from 
crop to crop, but also between plants of the 
same species. 

Plant-transformation experts are a rare 
breed, says Joyce van Eck, one such special- 
ist at Cornell University in Ithaca, New York. 
“There's a lot of art in what we do,” she said 
at the London workshop. “It’s difficult to find 
people with that training” 

Add to that a dearth of funding for new 
methods, and researchers are left having to 
rely on decades-old techniques. 


ABETTER WAY 

But that could change as the hunt for alter- 
natives heats up. Stewart and his collabora- 
tors have developed a robot that performs an 
established technique called protoplast trans- 
formation faster and more accurately than is 
possible by hand. The method uses enzymes 
to digest the cell wall, making it easier for 
researchers to introduce new genes. The prob- 
lem of regenerating the whole plant, however, 
remains. Researchers used a similar approach, 
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without robots, to perform CRISPR-Cas9 
gene editing in a variety of plants, including 
lettuce and rice. 

The cell-culture steps are still difficult. 
Stewart says that one person in his lab laboured 
unsuccessfully for two years to transform a tall 
grass that he uses for biofuel research. But the 
declining cost of enzymes allows researchers to 
perform more experiments, and the robotics 
improve throughput. Stewart is soenamoured 
with his creation that he has composed a song 
for it. “It's our baby right now,’ he says. 

Others, such as Fredy Altpeter of the 
University of Florida in Gainesville, are hunt- 
ing for a suite of genes that, when switched on 
or off, would make plant cells more amenable 
to transformation and regeneration from 
culture. “I think it will lead to much broader 
application of this technology, and will enable 
people who are not experts in cell culture to 
make those improvements,’ he says. 

But researchers can’t afford to wait for those 
developments, says Oldroyd. His project, 
which aims to develop cereals that use nitro- 
gen from the soil more efficiently, will plough 
through tests of hundreds of transgenes using 
the old, cumbersome methods. “We just have 
to be patient,” he says. m 
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INFECTIOUS DISEASE 


Infected mosquitoes fight Zika 


South America hosts largest trials yet of Wolbachia-infected insects to combat viruses. 


BY EWEN CALLAWAY 


r | wo South American metropolises are 
enlisting bacterium-infected mosquitoes 
to fight Zika. The effort is the world’s big- 

gest test yet of an unconventional but promising 

approach to quell mosquito-borne diseases. 

Mosquitoes that carry Wolbachia bacteria 
— which hinder the insects’ ability to trans- 
mit Zika, dengue and other viruses — will be 
widely released in Rio de Janeiro, Brazil, and 
Medellin, Colombia, over the next two years, 
scientists announced on 26 October. The 
deployments will reach around 2.5 million 
people in each city. “This really has the poten- 
tial to be a game changer in terms of vector 
control — the biggest thing since DDT,” says 
Philip McCall, a medical entomologist who 
studies mosquito control at the Liverpool 
School of Tropical Medicine, UK. 

Small numbers of Wolbachia-infected 
mosquitoes have already been released in both 
Rio de Janeiro and Medellin. But large biomedi- 
cal funders have now announced US$18 mil- 
lion to scale up the efforts. “We really want to 
deploy quite quickly in large sections of these 


cities,” says Scott O’Neill, a microbiologist at 
Monash University in Melbourne, Australia, 
and head of the Eliminate Dengue Program, 
which is leading the mosquito releases. Footing 
the bill are the Bill & Melinda Gates Founda- 
tion in Seattle, Washington, the London-based 
Wellcome Trust and the US and UK 
governments. Brazil's government 

is chipping in with an extra 
$3.7 million, O’Neill says. 


VIRUS BLOCKERS 


Wolbachia pipientis plagues p 
some 60% of insect species 
worldwide — but doesn’t 


naturally infect Aedes 
aegypti mosquitoes, the 
species that transmits Zika, 
dengue and numerous other 
viruses. The bacteria can hin- 
der their hosts’ fertility and influence the sex 

of offspring. They can also block viruses from 
reproducing in infected fruit flies and mosqui- 
toes, as O’Neill and his colleagues discovered in 
the late 1990s. The team later developed labora- 
tory populations of infected A. aegypti. 


Aedes aegypti mosquitoes spread 
Zika, dengue and other viruses. 


When tens of thousands of these mosquitoes 
were released near the small city of Cairns in 
northern Australia in 2011, the bacteria spread 
rapidly among local A. aegypti mosquitoes; 
90% of mosquitoes in a targeted area were 
infected within weeks. Tests in Indonesia and 

Vietnam found similar success. It’s 

not yet clear whether the strategy 

also reduces rates of dengue 

infections in humans, but 

O’Neill’s team has begun a 

trial in Yogyakarta, Indo- 
nesia, to find out. 

The Eliminate Dengue 

team started releas- 

ing Wolbachia-infected 

mosquitoes in two Rio de 

Janeiro neighbourhoods 

in 2014, and in a suburb 

of Medellin in 2015. The 
bacteria block the replication of Zika and 
chikungunya virus (which caused widespread 
outbreaks in Latin America and the Carib- 
bean in 2013-14). O’Neill’s team hopes that 
the scaled-up deployments can combat those 
diseases, as well as dengue, which infected > 
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> anestimated 1.6 million people in Brazil 
last year. The researchers plan to release 
mosquitoes in waves, survey the insects 
for infection and track the incidence of 
disease in areas with and without infected 
mosquitoes. 

Other scientists are also testing Wolbachia 
to control mosquitoes. In Singapore, 
officials plan to release male A. aegypti mos- 
quitoes infected with a strain of Wolbachia 
that renders their offspring infertile. A US 
biotechnology company is seeking approval 
to use a similar approach to combat related 
Asian tiger mosquitoes (Aedes albopictus), 
which carry dengue and chikungunya. In 
Guangzhou, China, scientists are releas- 
ing A. albopictus mosquitoes infected with 
Wolbachia each week, in large-scale field 
trials. And researchers in French Polynesia 
are trying the same strategy on another 
species of tiger mosquito. 


MORE EVIDENCE NEEDED 

Wolbachia has an impressive ability to 
surge through wild mosquito populations, 
says McCall, but proving that this limits 
human infections will be critical before the 
approach can find widespread use. If Wol- 
bachia is to make a dent in mosquito-borne 
diseases, the technique will also have to be 
cost-effective and long-lasting, he adds. “If 
it works, it will be truly remarkable, but it 
has to still be working in ten years.” 

Another hurdle facing the tests in Rio 
de Janeiro and Medellin is the size of the 
cities — especially Rio, with its densely 
populated and hard-to-access favelas, says 
McCall. But if Wolbachia can combat Zika, 
dengue and chikungunya in such environ- 
ments, “there is a very strong case for doing 
it for a range of other large cities’, says Mike 
Turner, acting director of science at the 
Wellcome Trust. Widespread deployment 
of Wolbachia-infected mosquitoes would 
probably also depend on endorsement from 
the World Health Organization, he adds. 

Public support could make or break the 
Wolbachia approach, says O’Neill, whose 
team spent years engaging with communi- 
ties in Australia before deploying mosqui- 
toes there. Wolbachia is already widespread 
among insects and it cannot infect humans, 
he notes. In Australia, researchers recruited 
schoolchildren, whom they dubbed Wol- 
bachia warriors, to rear the eggs at home, 
learn about their development and then 
release the mosquitoes. 

In Colombia, the Eliminate Dengue 
team has worked with “Casa Wolbachia” 
families to help with the release of mos- 
quitoes, and even written salsa songs about 
the bacteria, says co-principal investigator 
Jorge Osorio, a pathobiologist at the Uni- 
versity of Wisconsin—Madison. “We have 
communities asking us to spread more 
mosquitoes,” he says. m 
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Q&A Joshua Gordon 
Psychiatry needs 
more mathematics 


The US National Institute of Mental Health (NIMH) has a new director. On 12 September, 
psychiatrist Joshua Gordon took the reins at the institute, which has a budget of US$1.5 billion. 
He previously researched how genes predispose people to psychiatric illnesses by acting on neural 
circuits, at Columbia University in New York City. His predecessor, Thomas Insel, left the NIMH 
to join Verily Life Sciences, a start-up owned by Google’s parent company Alphabet, in 2015. 
Gordon says that his priorities at the NIMH will include “low-hanging clinical fruit, neural 
circuits and mathematics — lots of mathematics”, and explains to Nature what that means. 


What do you plan to achieve in your first year? 
I won't be doing anything radical. I am just 
going to listen to and learn from all the stake- 
holders — the scientific community, the 
public, consumer advocacy groups and other 
government offices. But I can say two general 
things. 

In the past 20 years, my two predecessors, 
Steve Hyman [now director of the Stanley 
Center for Psychiatric Research at the Broad 
Institute in Cambridge, Massachusetts] and 
Tom Insel, embedded into the NIMH the idea 
that psychiatric disorders are disorders of the 
brain, and to make progress in treating them 
we really have to understand the brain. I will 
absolutely continue this legacy. This does not 
mean we are ignoring the important roles of 
the environment and social interactions in 
mental health — we know they have a funda- 
mental impact. But that impact is on the brain. 
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Second, I will be thinking about how NIMH 3 
research can be structured to give payouts in 
the short, medium and long terms. 


How has neuroscience changed since you 
completed your medical residency in 2001? 
The advent of incredibly powerful tools to 
observe and alter activity in a subset of neu- 
rons, such as optogenetics, has been transfor- 
mational. It is allowing us to get at questions 
of how neural circuits produce behaviour — a 
research approach that may soon generate new 
treatments for psychiatric disorders. 


Which of the recent NIMH programmes do you 
find particularly exciting? 
One is the Human Connectome Project. The 
project has scanned the brains of more than 
1,000 healthy people to generate individual 
maps of their neural circuitry, the ‘wiring’ in 
their brains that accounts for their particular 
personalities. At the NIMH, we have created 
standardized databases, designed by the sci- 
entific community, to store this information. 
The Human Connectome Project is going 
to be a tremendous resource for the field — 
maybe not quite as impactful as the Human 
Genome Project, but on that scale, I think. 
A clinical programme that deserves as 
much attention, but perhaps doesnt get it, is 
the Coordinated Specialty Care project for 
individuals facing their first psychotic epi- 
sode. Some small studies have shown that 
coordinating different clinical and social- 
support programmes helps individuals to 
cope better. 


Is this an example of ‘low-hanging fruit’? 

Yes. We are now looking for similarly signifi- 
cant clinical problems where good, evidence- 
based interventions exist but are not widely 
adopted. For example, we have a range of 
screening tools that we think can help reduce 
the suicide rate, which has been rising in the 
United States. It could be advantageous to 
incorporate universal suicidality screening as 
a matter of routine into all emergency rooms. 


What about medium-term payouts? 

Neural circuits could be delivering treatments 
in 10 or 15 years. We don't yet know exactly 
which circuits we would want to modify to 
treat psychiatric disorders in humans. But now 
is the time to start thinking about which tools 
we are going to need to make this translational 
step possible, and invest in them. 

Most work on neural circuits has been done 
in genetically modified mice, where it is rela- 
tively easy to control the activity of a few very 
specific cells in a particular brain area using 
tools such as optogenetics. We'll need safe 
methods for humans. Should we be thinking 
in terms of viruses that can be directed to, and 
change activity in, specific neurons? Or should 
we be thinking of ways to stimulate or inhibit 
these cells indirectly, using transcranial mag- 
netic stimulation or deep-brain stimulation,? 


And the long term? 

The really transformative treatments that are 
going to change mental-health care in the long 
term will depend on us learning how the brain 
works as a whole. We are all tempted to reduce 
the huge complexity of the brain into under- 
standable chunks. But to appreciate and exploit 
that complexity, we will need to be able to inte- 
grate everything we know, from molecular 
biology to behaviour, into our models of how 
the brain works. That requires serious math. 


How does the structure of a neuron affect 
its integration into a circuit? How does that 
circuit affect the neural system thatit fits into? 
How does the dynamic activity in these neural 
systems drive behaviour? Fully characterizing 
each of these levels and then integrating them 
across scales requires a level of mathematical 
rigour that most of us, including myself, have 
not really brought to bear on the problem. 


Isn’t the mathematics going to get very 
difficult for neuroscientists? 

It's not so difficult — I'm not saying that we 
are going to need string theorists! It’s just 
a question of appropriate training in math 
for students. In the future, I hope that every 
experimentalist will also be a theoretician. 
But at this stage we need to encourage experi- 
mental neurobiologists to form long-term 
interdisciplinary collaborations with theo- 
reticians, mathematicians or physicists. We 
need to inject more math into every level of the 
NIMH portfolio. Math can also have a short- 
term impact in psychiatry for things such as 
predicting individual responses to drugs and 
improving precision medicine more generally. 


Are non-human primates still necessary in 
neuroscience research? 

Most of our knowledge about the brain has 
been gained in mice. It is hard for me to believe 
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that we'll really be able to translate the knowl- 
edge that we have won in mice into the design 
of new treatments for humans without going 
through an intermediate species with an elabo- 
rated prefrontal cortex and a large brain. So 
unfortunately, yes, I think we do still need to 
use non-human primates. We need to do so 
judiciously, though — the welfare of animals 
is fundamental, and we need to minimize the 
numbers of all of the animals that we use. 


Ina move to a circuits-based approach, the 
NIMH introduced the Research Domain 
Criteria (RDoC), which encourages clinical 
researchers to investigate specific behaviours 
rather than broad diagnoses. It is widely 
disliked — will you be maintaining it? 

Clinical neuroscience has typically tried to 
identify the neurobiology that underlies diag- 
noses [such as depression]. That hasn't got 
us very far. Maybe if we instead try to under- 
stand the neurobiology underlying the various 
domains of behaviour [such as apathy], we'll 
get better insight. I see RDoC as something 
potentially very valuable, something I am 
likely to keep — although it may need a few 
tweaks to extract the most value out of it. m 


INTERVIEW BY ALISON ABBOTT 
This interview has been edited for length and clarity. 
See go.nature.com/2f88hif for a longer version. 
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Can wind and solar 
fuel Africa’s future? 


With prices for renewables dropping, many countries in Africa 
might leap past dirty forms of energy towards a cleaner future. 


BY ERICA GIES 


t the threshold of the Sahara Desert near Ouarzazate, 
Morocco, some 500,000 parabolic mirrors run in neat rows 
across a valley, moving slowly in unison as the Sun sweeps 
overhead. This US$660-million solar-energy facility opened 
in February and will soon have company. Morocco has committed to 
generating 42% of its electricity from renewable sources by 2020. 
Across Africa, several nations are moving aggressively to develop 
their solar and wind capacity. The momentum has some experts won- 
dering whether large parts of the continent can vault into a clean future, 
bypassing some of the environmentally destructive practices that have 
plagued the United States, Europe and China, among other places. 
“African nations do not have to lock into developing high-carbon 
old technologies,’ wrote Kofi Annan, former secretary-general of the 
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United Nations, in a report last year’. “We can expand our power gen- 
eration and achieve universal access to energy by leapfrogging into new 
technologies that are transforming energy systems across the world.” 
That's an intoxicating message, not just for Africans but for the entire 
world, because electricity demand on the continent is exploding. Africa's 
population is booming faster than anywhere in the world: it is expected 
to almost quadruple by 2100. More than half of the 1.2 billion people liv- 
ing there today lack electricity, but may get it soon. If much of that power 
were to come from coal, oil and natural gas, it could kill international 
efforts to slow the pace of global warming. Buta greener path is possible 
because many African nations are just starting to build up much of their 
energy infrastructure and have not yet committed to dirtier technology. 
Several factors are fuelling the push for renewables in Africa. More 
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MAINSTREAM RENEWABLE POWER 


Jeffreys Bay Wind than one-third of the continent’s nations get the 
Farm in South Africa bulk of their power from hydroelectric plants, 
generates enough and droughts in the past few years have made 
energy to power that supply unreliable. Countries that rely pri- 
100,000 homes there. marily on fossil fuels have been troubled by 


price volatility and increasing regulations. At 
the same time, the cost of renewable technology has been dropping dra- 
matically. And researchers are finding that there is more potential solar 
and wind power on the continent than previously thought — as much as 
3,700 times the current total consumption of electricity. 

This has all led to a surging interest in green power. Researchers are 
mapping the best places for renewa- 
ble-energy projects. Forward-look- 
ing companies are investing in solar 
and wind farms. And governments 
are teaming up with international- 
development agencies to make the 
arena more attractive to private firms. 

Yet this may not be enough to pro- 
pel Africa to a clean, electrified future. 
Planners need more data to find the 
best sites for renewable-energy pro- 
jects. Developers are wary about 
pouring money into many countries, especially those with a history of 
corruption and governmental problems. And nations will need tens of 
billions of dollars to strengthen the energy infrastructure. 

Still, green ambitions in Africa are higher now than ever before. Eddie 
O'Connor, chief executive of developer Mainstream Renewable Power 
in Dublin, sees great potential for renewable energy in Africa. His com- 
pany is building solar- and wind-energy facilities there and he calls it 
“an unparalleled business opportunity for entrepreneurs’. 


POWER PROBLEMS 

Power outages are a common problem in many African nations, but 
Zambia has suffered more than most in the past year. It endured a 
string of frequent and long-lasting blackouts that crippled the economy. 
Pumps could not supply clean water to the capital, Lusaka, and indus- 
tries had to slash production, leading to massive job lay-offs. 

The source of Zambia's energy woes is the worst drought in southern 
Africa in 35 years. The nation gets nearly 100% of its electricity from 
hydropower, mostly from three large dams, where water levels have plum- 
meted. Nearby Zimbabwe, South Africa and Botswana have also had to 
curtail electricity production. And water shortages might get worse. Pro- 
jections suggest that the warming climate could reduce rainfall in south- 
ern Africa even further in the second half of the twenty-first century. 

Renewable energy could help to fill the gap, because wind and solar 
projects can be built much more quickly than hydropower, nuclear 
or fossil-fuel plants. And green-power installations can be expanded 
piecemeal as demand increases. 

Egypt, Ethiopia, Kenya, Morocco and South Africa are leading the 
charge to build up renewable power, but one of the biggest barriers is 
insufficient data. Most existing maps of wind and solar resources in Africa 
do not contain enough detailed information to allow companies to select 
sites for projects, says Grace Wu, an energy researcher at the University of 
California, Berkeley. She co-authored a report’ on planning renewable- 
energy zones in 21 African countries, a joint project by the Lawrence 
Berkeley National Laboratory (LBNL) in California and the International 
Renewable Energy Agency (IRENA) in Abu Dhabi. The study is the most 
comprehensive mapping effort so far for most of those countries, says Wu. 
It weighs the amount of solar and wind energy in the nations, along with 
factors such as whether power projects would be close to transmission 
infrastructure and customers, and whether they would cause social or 
environmental harm. “The IRENA-LBNL study is the only one that has 
applied a consistent methodology across a large region of Africa,’ says 
Wu. High-resolution measurements of wind and solar resources have 
typically been done by government researchers or companies, which 


“A decade ago, people 
would have told you there 
isn’t any wind in regions 

such as East Africa.” 
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kept tight control of their data. The Berkeley team used a combination of 
satellite and ground measurements purchased from Vaisala, an environ- 
mental monitoring company based in Finland that has since made those 
data publicly available through IRENA’s Global Atlas for Renewable 
Energy. The team also incorporated geospatial data — the locations of 
roads, towns, existing power lines and other factors — that could influ- 
ence decisions about where to put energy projects. “If there's a forest, 
you dont want to cut it down and puta solar plant there,’ says co-author 
Ranjit Deshmukh, also an energy researcher at Berkeley. 

The amount of green energy that could be harvested in Africa is 
absolutely massive, according to another IRENA report’, which syn- 
thesized 6 regional studies and found 
potential for 300 million megawatts 
of solar photovoltaic power and more 
than 250 million megawatts of wind 
(see ‘Power aplenty’). By contrast, 
the total installed generating capac- 
ity — the amount of electricity the 
entire continent could produce if all 
power plants were running at full 
tilt — was just 150,000 megawatts at 
the end of 2015. Solar and wind power 
accounted for only 3.6% of that. 

The estimate of wind resources came asa surprise, says Oliver Knight, 
a senior energy specialist for the World Bank’s Energy Sector Manage- 
ment Assistance Program in Washington DC. Although people have long 
been aware of Africa's solar potential, he says, as of about a decade ago, 
few local decision-makers recognized the strength of the wind. “People 
would have told you there isn’t any wind in regions such as East Africa.” 

The World Bank is doing its own studies, which will assess wind speeds 
and solar radiation at least every 10 minutes at selected sites across tar- 
get countries. It will ask governments to add their own geospatial data, 
and will combine all the information into a user-friendly format that is 
freely available and doesn’t require advanced technical knowledge, says 
Knight. “Tt should be possible for a mid-level civil servant in a developing 
country to get online and actually start playing with this.” 


SOUTH AFRICA LEADS 

In the semi-arid Karoo region of South Africa, a constellation of bright 
white wind turbines rises 150 metres above the rolling grassland. 
Mainstream Renewable Power brought this project online in July, 
17 months after starting construction. The 35 turbines add 80 megawatts 
to South Africa’s supply, enough to power about 70,000 homes there. 

The Noupoort Wind Farm is just one of about 100 wind and solar 
projects that South Africa has developed in the past 4 years, as prices 
fell below that of coal and construction lagged on two new massive coal 
plants. South Africa is primed to move quickly to expand renewable 
energy, in part thanks to its investment in data. 

Environmental scientist Lydia Cape works for the Council for 
Scientific and Industrial Research, a national lab in Stellenbosch. She 
and her team have created planning maps for large-scale wind and solar 
development and grid expansion. Starting with data on the energy 
resources, they assessed possible development sites for many types of 
socio-economic and environmental impact, including proximity to elec- 
tricity demand, economic benefits and effects on biodiversity. 

The South African government accepted the team’s recommendations 
and designated eight Renewable Energy Development Zones that are close 
to consumers and to transmission infrastructure — and where power pro- 
jects will cause the least harm to people and ecosystems. They total “about 
80,000 square kilometres, the size of Ireland or Scotland, roughly’, says 
Cape. The areas have been given streamlined environmental authoriza- 
tion for renewable projects and transmission corridors, she says. 

But for African nations to go green in a big way, they will need a huge 
influx of cash. Meeting sub-Saharan Africa’s power needs will cost 
US$40.8 billion a year, equivalent to 6.35% of Africa’s gross domestic 
product, according to the World Bank. Existing public funding falls 
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POWER APLENTY 


Studies of some African nations suggest that they could harvest vast 
amounts of power from wind turbines and solar photovoltaic (PV) projects. 


@ Wind potential 
@ Solar PV potential 
@ 2030 demand 


The potential for solar and 
wind power far exceeds 


estimates of electricity 
demand in 2030. 


South 
Africa 


Angola Zambia Zimbabwe Tanzania Kenya Egypt Ethiopia 


far short, so attracting private investors is crucial. Yet many investors 
perceive African countries as risky, in part because agreements there 
require long and complex negotiations and capital costs are high. “It’s 
a real challenge,’ says Daniel Kammen, a special envoy for energy for 
the US Department of State and an energy researcher at the University 
of California, Berkeley. “Many of these countries have not had the best 
credit ratings.” 

Elham Ibrahim, the African Union’s commissioner for infrastructure 
and energy, advises countries to take steps to reassure private investors. 
Clear legislation supporting renewable energy is key, she says, along 
with a track record of enforcing commercial laws. 

South Africa is setting a good example. In 2011, it established a 
transparent process for project bidding called the Renewable Energy 
Independent Power Producer Procurement Programme (REIPPPP). 
The programme has generated private investments of more than 
$14 billion to develop 6,327 megawatts of wind and solar. 

Mainstream Renewable Power has won contracts for six wind farms 
and two solar photovoltaic plants through REIPPPP. “This programme 
is purer than the driven snow,’ says O'Connor. “They publish their 
results. They give state guarantees. They don't delay you too much.” 
Although the country’s main electricity supplier has wavered in its sup- 
port for renewables, the central government remains committed to the 
programme, he says. “I would describe the risks in South Africa as far 
less than the risks in England in investing in renewables.” 

For countries less immediately attractive to investors, the World Bank 
Group launched the Scaling Solar project in January 2015. This reduces 
risk to investors with a suite of guarantees, says Yasser Charafi, princi- 
pal investment officer for African infrastructure with the International 
Finance Corporation (IFC) in Dakar, which is part of the World Bank 
Group. Through the Scaling Solar programme, the IFC offers low-priced 
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loans; the World Bank guarantees that governments will buy the power 
generated by the projects; and the group's Multilateral Investment Guar- 
antee Agency offers political insurance in case of a war or civil unrest. 

Zambia, the first country to have access to Scaling Solar, has won 
two solar projects that will together provide 73 megawatts. Senegal and 
Madagascar were next, with agreements to produce 200 and 40 mega- 
watts, respectively. Ethiopia has just joined, and the IFC will give two 
further countries access to the programme soon; its target is to develop 
1,000 megawatts in the first 5 years. 


MAKING IT FLOW 

That power won't be useful ifit can’t get to users. One of the big barriers 
toa clean-energy future in Africa is that the continent lacks robust elec- 
tricity grids and transmission lines to move large amounts of power 
within countries and across regions. 

But that gap also provides some opportunities. Without a lot of 
existing infrastructure and entrenched interests, countries there might 
be able to scale up renewable projects and manage electricity more 
nimbly than developed nations. That’s what happened with the tele- 
phone industry: in the absence of much existing land-line infrastruc- 
ture, African nations rapidly embraced mobile phones. 

The future could look very different from today’s electricity industry. 
Experts say that Africa is likely to have a blend of power-delivery 
options. Some consumers will get electricity from a grid, whereas people 
in rural areas and urban slums — where it is too remote or too expensive 
to connect to the grid — might end up with small-scale solar and wind 
installations and minigrids. 

Still, grid-connected power is crucial for many city dwellers and for 
industrial development, says Ibrahim. And for renewables to become an 
important component of the energy landscape, the grid will need to be 
upgraded to handle fluctuations in solar and wind production. African 
nations can look to countries such as Germany and Denmark, which have 
pioneered ways to deal with the intermittent nature of renewable energy. 
One option is generating power with existing dams when solar and wind 
lag, and cutting hydropower when they are plentiful. Another technique 
shuttles electricity around the grid: for example, if solar drops off in one 
place, power generated by wind elsewhere can pick up the slack. A third 
strategy, called demand response, reduces electricity delivery to multiple 
customers by imperceptible amounts when demand is peaking. 

These cutting-edge approaches require a smart grid and infrastructure 
that connects smaller grids in different regions so that they can share 
electricity. Africa has some of these ‘regional interconnections, but they 
are incomplete. Four planned major transmission corridors will need 
at least 16,500 kilometres of new transmission lines, costing more than 
$18 billion, says Ibrahim. Likewise, many countries’ internal power grids 
are struggling to keep up. 

That's part of what makes working in energy in Africa challenging. 
Prosper Amuquandoh is an inspector for the Ghana Energy Commis- 
sion and the chief executive of Smart and Green Energy Group, an 
energy-management firm in Accra. In Ghana, he says, “there’s a lot of 
generation coming online”. 

The country plans to trade electricity with its neighbours in a West 
African Power Pool, Amuquandoh says, but the current grid cannot han- 
dle large amounts of intermittent power. Despite the challenges, he brims 
with enthusiasm when he talks about the future: “The prospects are huge.” 

With prices of renewables falling, that kind of optimism is spreading 
across Africa. Electrifying the continent is a moral imperative for every- 
one, says Charafi. “We cannot just accept in the twenty-first century that 
hundreds of millions of people are left out? = 


Erica Gies is a freelance journalist in Victoria, British Columbia. 
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COMMENT 


Earth scientists of 
different stripes must talk to, 
not past, each other p.25 


heart of the world’s most 
mysterious manuscript p.28 


Journey to the Modern zoosemerged | There is anorth- 
through adaptation _ south inequality in 
and survival p.29 inequality studies p.31 


The Dark Energy Survey at the Cerro Tololo Inter-American Observatory in Chile is mapping the large-scale structure of the Universe traced by galaxies. 


Good data are not enough 


A vibrant scientific culture encourages many interpretations 


his summer, I visited the Mayan 
city of Chichén Itza in the Yucatan 
Peninsula, Mexico. It has an ancient 
observatory where priest-astronomers made 
detailed astronomical observations around 
AD 600-1200. The ruins — stepped pyra- 
mids, temples, columned arcades and other 
stone structures — reveal that astronomy was 
at the heart of this sophisticated society. 
The Mayans accurately tracked changes 
in the positions and relative brightness 
of the Sun, Moon, planets and stars. They 
documented their astronomical data in fold- 
ing books called codices, with many more 
quantitative details than other civilizations 


of evidence, argues Avi Loeb. 


at the time. The priest-astronomers used 
observations and advanced mathematical 
calculations to predict eclipses, and devised a 
365-day solar calendar that was off by just 
one month every 100 years. 

So why, I wondered, didn’t the Mayans 
go further and infer aspects of our modern 
understanding of astronomy? They deter- 
mined the orbital periods of Venus, Mars 
and Mercury around the Sun, but Earth was 
at the centre of their Universe. 

I came to appreciate how limiting prevail- 
ing world views can be. Just as geological and 
other evidence for the great age of Earth was 
rejected before the nineteenth century as 


being hard to square with biblical history, 
the Mayans used their fine data to support 
a mythological culture of astrology. They 
correlated the periodic motions of celes- 
tial objects with human history and, rather 
than seeking a physical explanation for their 
astronomical data, they used it to initiate 
wars or rituals such as human sacrifice. 
Have we learned our lesson, or is today’s 
science similarly trapped by cultural and 
societal forces? Most research funding is 
allocated assuming that the highest-quality 
data will inevitably deliver useful scientific 
interpretation and theoretical concepts, 
which can be tested and refined by future 
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> data. The astronomy division of the 
US National Science Foundation, for exam- 
ple, devotes most of its funds to major facili- 
ties and large surveys, which are performed 
by big teams to collect better data within 
mainstream paradigms. Fields from particle 
physics to genomics do the same. 

The consequences of a closed scientific 
culture are wasted resources and misguided 
‘progress’ — witness the dead end that was 
Soviet evolutionary biology. To truly move 
forward, free thought must be encouraged 
outside the mainstream. Multiple inter- 
pretations of existing data and alternative 
motivations for collecting new data must be 
supported. 


BLINKERED VIEW 

Mayan cosmologists had high social status. 
They got generous support because of their 
promises to forecast the future. Cosmolo- 
gists today collect vast amounts of exqui- 
site data in surveys of large parts of the sky, 
costing billions of dollars. 

Surveys of the large-scale structure of the 
Universe traced by galaxies include the US 
Baryon Oscillation Spectroscopic Survey 
and the international Dark Energy Survey, 
as well as forthcoming facilities such as 
the US Dark Energy Spectroscopic Instru- 
ment, the European Space Agency’s (ESAs) 
Euclid mission, NASA’s Wide Field Infrared 
Survey Telescope and the Large Synoptic 
Survey Telescope in Chile. Others mapping 
the primordial seeds of these structures as 
traced by cosmic microwave background 
radiation include ESAs Planck satellite, the 
US South Pole Telescope and international 
collaborations such as the Atacama Cosmol- 
ogy Telescope and the Simons Array. 

Such projects have a narrow aim — pin- 
ning down the parameters of one theoretical 
model. The model comprises an expanding 
Universe composed of dark matter, dark 
energy and normal matter (from which 
stars, planets and people are made), with ini- 
tial conditions dictated by an early phase of 
rapid expansion called cosmic inflation. The 
data are reduced to a few numbers. Surprises 
in the rest are tossed away. 

I noticed this bias recently while assessing 
a PhD thesis. The student was asked to test 
whether a data set from a large cosmological 
survey was in line with the standard cosmo- 
logical model. But when a discrepancy was 
found, the student's goal shifted to explain- 
ing why the data set was incomplete. In such 
a culture, the current model can never be 
ruled out, even though everyone knows 
that its major constituents (dark matter, dark 
energy and inflation) are not understood at 
a fundamental level. 

Instead, observers should present results 
ina theory-neutral way. Observations should 
not converge on one model but aim to find 
anomalies that carry clues about the nature of 


dark matter, dark energy or initial conditions 
of the Universe. Further observations should 
be motivated by testing unconventional inter- 
pretations of those anomalies (such as exotic 
forms of dark matter or modified theories of 
gravity). Vast data sets may contain evidence 
for unusual behaviour that was unanticipated 
when the projects were conceived. If all results 
are expected and planned for, babies may be 
thrown out with the bathwater. 


BLINDED BY BEAUTY 

How each culture views the Universe is 
guided by its beliefs in, for example, math- 
ematical beauty or the structure of reality. If 
these ideas are deeply rooted, people tend 
to interpret all data as supportive of them 
— adding parameters or performing math- 
ematical gymnastics to force the fit. Recall 
how the belief that the Sun moves around 
Earth led to the mathematically beautiful 
(and incorrect) theory of epicycles advo- 
cated by the ancient Greek philosopher 
Ptolemy. 

Similarly, modern cosmology is augmented 
by unsubstantiated, mathematically sophis- 
ticated ideas — of the multiverse, anthropic 
reasoning and string theory. The multiverse 
idea postulates the existence of numerous 
other regions of space-time, to which we 
have no access and in which the cosmologi- 
cal parameters have different values. 

The anthropic argument is then often 
applied. It holds that our own region has the 
parameters it does (including those of dark 
energy and dark matter) because other, more 
likely values would not have allowed life to 
develop near a star like the Sun in a galaxy 
such as the Milky Way’ *. An overlooked 
problem with this argument is that, accord- 
ing to one analysis’, life is 1,000 times more 
likely to exist 10 trillion years from now 
around stars that weigh one-tenth the mass 
of the Sun. This means that terrestrial life 
might be premature and not the most likely 
form of life, even in our own Universe’. 

The anthropic argument, which as yet 
has no empirical support, suppresses 


The Mayan pyramid at Chichén Itza in Mexico. 
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much-needed efforts to understand dark 
energy through an alternative theory that 
unifies quantum mechanics and gravity. The 
fact that we have not yet converged on such 
a theory is indicated by paradoxes in other 
areas of physics. For example, information 
contained in, say, an encyclopaedia is lost if 
it is swallowed bya black hole that ultimately 
evaporates into heat known as Hawking 
radiation. This contradicts a basic premise 
of quantum mechanics that information is 
preserved, and is known as the ‘information 
paradox”. In addition, currently viable mod- 
els of cosmic inflation require fine-tuning 
of the conditions of the Universe before and 
during inflation®. 

Cultivating other approaches avoids stall- 
ing progress by investing only in chasing 
what might turn out to be ‘epicycles. After 
all, the standard model of cosmology is 
merely a precise account of our ignorance: 
we do not understand the nature of inflation, 
dark matter or dark energy. The model has 
difficulties accounting for the luminous gas 
and stars that we can see in galaxies, while 
leaving invisible what we can easily calculate 
(dark matter and dark energy). This state of 
affairs is clearly unsatisfactory. 

The tendency to establish large projects 
and firm up mainstream ideas is a signature 
of a mature scientific discipline. In such a 
culture, the low-hanging fruit has already 
been picked by small, versatile teams that are 
long gone. Critics argue that when funds are 
limited, the focus of research should be on 
coordinated approaches that are likely to 
produce results in a predictable way. This 
advocacy fails to appreciate that our main- 
stream paradigm might be heading in the 
wrong direction. The opportunity for making 
mistakes is much greater than for real break- 
throughs, so as any venture capitalist knows, 
investing part of the portfolio in risky endeav- 
ours is necessary to gain substantial profits. 


ALTERNATIVE PATHS 

The only way to work out whether we are on 
the wrong path is to encourage competing 
interpretations of the known data. 

I have been arguing for many years that 
funding agencies should promote the 
analysis of data for serendipitous purposes 
beyond major programmes and the main- 
stream dogma. The need for a change in 
course is even more pressing now. Empirical 
constraints on expected forms of dark matter 
(such as weakly interacting massive particles 
or supersymmetric partners to known parti- 
cles) are getting tighter, and the hope of iden- 
tifying testable consequences of string theory 
is receding. Ata minimum, when funding 
is tight, a research frontier should main- 
tain at least two ways of interpreting data 
so that new experiments will aim to select 
the correct one. This should apply to alter- 
natives to inflation when dealing with new 
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cosmological data, and to alternatives 
to cold dark matter when discrepan- 
cies are observed in the properties of 
dark-matter-dominated galaxies. 

New funding streams should be 
established in other fields. The LIGO 
discovery of black-hole mergers should 
encourage a ‘template-free’ search for 
new sources of gravitational waves that 
were never imagined. The Kepler satel- 
lite’s discovery that roughly one-quarter 
of all stars in the Galaxy host a habit- 
able Earth-mass planet’ should lead to a 
renewed effort in the search for extrater- 
restrial life, including new methods for 
finding intelligent civilizations*. Indeed, 
a habitable planet was recently discov- 
ered’ around the nearest star to our 
Sun, Proxima Centauri, which could be 
probed with a future spacecraft (http:// 
breakthroughinitiatives.org/Concept/3). 

A healthy dialogue between different 
points of view should be fostered through 
multidisciplinary conferences that discuss 
conceptual issues, not just experimental 
results and phenomenology. A diversity 
of views fosters healthy progress and pre- 
vents stagnation. In September, I had the 
privilege of founding an interdisciplinary 
centre, the Black Hole Initiative at Harvard 
University in Cambridge, Massachusetts, 
which brings together astronomers, physi- 
cists, mathematicians and philosophers. 
Our experience is that a mix of scholars 
with different vocabularies and com- 
fort zones can cultivate innovation and 
research outside the box. Already the 
centre has prompted exciting insights on 
the reality of naked singularities in space- 
time, the prospects for imaging black-hole 
silhouettes and the information paradox. 

Such simple, off-the-shelf remedies 
could help us to avoid the scientific 
fate of the otherwise admirable Mayan 
civilization. = 
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University in Cambridge, Massachusetts, 
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A lava flow from the Puu Oo volcanic cone in Hawaii. 


Bridge the 
planetary divide 


To explain why our planet is habitable, geoscientists 
studying Earth’s surface and interior must work with 
each other and with communications scholars, write 

Ariel D. Anbar, Christy B. Till and Mark A. Hannah. 


r | The classic 1970s British television 
series Upstairs, Downstairs is a good 
metaphor for our planet’s evolution. 

Like the show, Earth's habitability depends 

on the dynamics of a complex house- 

hold, and on subtle interactions between 
divided worlds. 

Upstairs, at its surface, Earth is rich in 
molecular oxygen. O, is the second-most 
abundant gas in the atmosphere, making 
up 21% of our air. It reacts readily, so most 


of Earth’s surface is oxidized. Downstairs, 
by contrast, in Earth’s interior, molecular 
O, is vanishingly rare. Materials brought 
up from the mantle, such as volcanic rocks, 
react with O, when they are exposed. 
Earth’s oxidized surface is a veneer envel- 
oping a vast O, sink. 

This contrast was not always so stark. 
It changed halfway through the planet’s 
history. Around 2.3 billion years ago, the 
amount of O, in the atmosphere rose > 
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> above one part per million, beginning 
an ascent to the high levels of today’. This 
Great Oxidation Event transformed Earth 
and made intelligent life possible. Its cause 
remains a mystery. Solving it is a key chal- 
lenge for Earth-systems scientists. It is also a 
challenge for astrobiologists: their ability 
to use O, as a signature of life on planets 
beyond our Solar System hinges on a better 
understanding of how it arose on Earth. 

Key to that story is the balance between 
organisms’ production of O, on the surface 
and consumption of the gas by reactions 
with rocks, fluids and gases from the inte- 
rior. But we lacka 


quantitative theory “Few scientists 
of our planet’sevo- and engineers 
lution that links realize 
changes at the sur- how deeply 
face with those language 
below. affects their 

In part, that is ¢g/laborations 


because the sur- 
face and solid Earth 
research communities struggle to com- 
municate with each other. After examin- 
ing interactions within our own large 
Dynamics of Earth System Oxygenation 
team, funded by the US National Science 
Foundation, we learned that researchers in 
these neighbouring fields barely speak the 
same language. Our challenges are as much 
sociological as they are scientific. 

A fuller theory of Earth evolution will 
emerge by bridging three divides. First, 
geoscientists studying the surface history 
of O, — typically geobiologists and low- 
temperature geochemists — need to under- 
stand how the gas is influenced by what goes 
on below. Second, geoscientists studying 
Earth's interior — geophysicists and high- 
temperature geochemists — must realize 
that such questions are also germane to 
some of their most vexing challenges. Third, 
geoscientists of all stripes should improve 
their conversations by integrating methods 
from communications disciplines. 


and research.” 


SURFACE PUZZLES 
Geoscientists trying to explain the rise 
of O, in Earth’s atmosphere increasingly 
realize that something prevented this gas 
from accumulating for a billion years or 
more before the Great Oxidation Event. 
Geological evidence is mounting that 
photosynthesis was producing O, as early 
as 3.5 billion years ago. Microbial ‘mats’ 
in shallow waters at ancient seashores — 
preserved today as fossilized stromatolites 
that date back to at least that time — could 
have been inhabited by O,-making cyano- 
bacteria. Studies of the abundances of 
carbon, molybdenum and other elements 
and their isotopes in marine shales and 
carbonate rocks support this picture’. 
Interactions between the surface and 


interior of the Earth are implicated’. Rocks 
derived from the mantle, such as basalts, 
consume O, when they weather. Oxygen 
also reacts with hydrogen and other gases 
released from volcanoes, hydrothermal vents 
and mineral reactions. Because the atmos- 
phere is thin compared with the planet's 
internal bulk, even small changes in the 
rates at which these rocks and gases consume 
O, could have a big impact. Those changes 
might arise from alterations in the composi- 
tions of materials derived from the mantle, 
or in the rates at which they are brought to 
the surface or are dragged back down by the 
subduction of tectonic plates. 


COOLING TROUBLES 

Many solid Earth scientists are unaware 
that the quest to understand the rise of O, 
in Earth’s atmosphere can provide new 
impetus to investigations of fundamental 
aspects of the planet’s internal evolution. 

As Earth cooled after its formation, 
mantle convection may have slowed. The 
abundance of iron and magnesium in mag- 
mas derived from the mantle decreased. 
The modern tectonic processes that recy- 
cle crust into the mantle kicked in. And 
the crust became richer in silicon dioxide 
(SiO,), and more buoyant*®°. The distri- 
bution of heat and elements in the man- 
tle were altered as surface minerals 
were mixed in and as iron-nickel alloy 
was steadily lost to Earth’s growing core. 

Such cascades of changes probably 
affected O, at the surface. By the time of 
the Great Oxidation Event, the rate of O, 
consumption by reaction with rocks and 
gases originating in Earth’s interior may 
have slowed enough that O, produced by 
photosynthesis could accumulate in the 
atmosphere. 

None of these changes is well-quantified. 
For instance, whether the upper mantle’s 
capacity to consume O, evolved or not is still 
debated. For 20 years, researchers thought 
that it did not. But recent measurements 
(some conducted by members of our team) 
of vanadium and scandium in ancient rocks 
derived from the mantle indicate that its O, 
consumption capacity might have fallen 
in the 1.5-billion-year run-up to the Great 
Oxidation Event’. Changes in the compo- 
sition of the continental crust also suggest a 
decrease in O, consumption by rock-weath- 
ering processes around the same time’. 

Thus, unravelling the mystery of Earth’s 
O, requires a quantitative theoretical 
model of the physics and chemistry of 
planetary cooling and its consequences 
for surface-interior interactions over time. 


LOOKING BEYOND 

Such a model would have benefits beyond 
the geosciences. Astronomers are hop- 
ing to use O, as a fingerprint of life on 
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Earth-like exoplanets. But will O, inevi- 
tably accumulate if biological processes 
produce it in large amounts? Stars have 
a wide range of abundances of elements 
such as carbon, magnesium and silicon. 
The exoplanets that form around them 
must vary in their compositions too, 
which would affect their tectonics and 
internal chemistry’. So, on some worlds, 
the rate of surface-interior interactions or 
the mantle’s capacity to consume O, may 
be so high that the gas cannot accumulate. 
On such worlds, O, may be useless as a 
signature of life. 

Astronomers need to know which exo- 
planets are worth investigating intensely 
for O,, and for which this might be a waste 
of precious telescope time. A quantitative 
model that incorporates a wide range of 
planetary compositions would indicate 
which Earth-like planets have a chance 
of developing O,-rich atmospheres and 
which will never do so even if they are 
teeming with O,-producing life. 
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CROSSING DIVIDES 


To bridge these disciplinary divides takes 
effective conversation. Yet few scientists 
and engineers realize how deeply language 
affects their collaborations and research. 
To close this gap, our team developed a 
research partnership with some social sci- 
entists and humanities scholars who work 
on communication and team dynamics. 
Research in their field shows how diverse 
teams work more effectively when they 
develop a shared language — common 
vocabulary, jargon, codes and linguistic 
styles as well as implicit understandings”. 

Our first step was to examine the lan- 
guage that our investigators used, to 
identify and confront gaps in our group’s 
understanding of concepts related to 
Earth’s oxygenation. We were motivated by 
a paradox that we observed: scientists in 
closely allied disciplines find it hardest to 
communicate effectively with one another. 
Astronomers, biologists and geoscien- 
tists are willing to ask each other ‘dumb’ 


questions that expose shared and divergent 
understanding. But a geobiologist working 
with a geophysicist might assume a shared 
understanding that does not exist. 

For example, solid Earth scientists and 
geobiologists share the word ‘oxygenation’ 
but in fact lack a common language to 

describe the amount 


“Scientists in of O, that is available 
closelyallied to react. Geobiol- 
disciplines ogists, used to high 
find ithardest atmospheric levels, 
to speak toone thinkin terms of O, 
another.” partial pressures and 


molarities when dis- 
solved in solution, and so have developed 
a specialized vocabulary to describe envi- 
ronments with different amounts of free 
O, (such as ‘oxic’, ‘anoxic’, ‘suboxic’ and 
‘euxinic’). Solid Earth scientists use the 
physio-chemical term, ‘oxygen fugacity’, 
to reflect the fact that oxygen in the deep 
Earth is mainly locked in minerals and not 
in the form ofan ideal gas. So conversation 


is stalled by even a 
seemingly simple 
question such as: 
“How can we com- 
pare a sediment’s 
capacity to consume O, relative to the 
mantle?’ 

Scholars who study how people share 
ideas have analytical skills and methods 
that can address this challenge. These 
begin with carrying out surveys and 
interviews, and designing visualizations 
to demonstrate differences in use of lan- 
guage and its impacts (see Supplementary 
Information; go.nature.com/2e0gypi). 

Such data feed into analyses of social 
networks that help team leaders to iden- 
tify and empower investigators most able 
to bridge subdisciplines — in our case, 
people who score highly on understanding 
both surface and deep Earth terms — or to 
identify people with hybrid knowledge who 
can address particular points of overlap. 
Because investigators are attuned mainly 
to their own group’s language, efforts must 
be made to help each group appreciate 
how their concepts relate to others’ and 
how each perspective informs the research 
questions pursued. 

By gaining such awareness and working 
together more effectively, geoscientists stud- 
ying Earth’s surface and interior, drawing 
on analyses of discourses and team dynam- 
ics, can build a model for the evolution of 
Earth's O, rise, better understand the history 
of Earth’s habitability, and inform the search 
for life on worlds beyond our own. = 


Explorer Sam 
Crossman descends 
into Vanuatu’s fiery 
Marum lava lake. 
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Pages from the ‘herbal’ section of the Voynich manuscript. 
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Calligraphic conundrum 


Andrew Robinson relishes a new volume on a work that has long defied decoders. 


justifiably called the most mysterious in 
the world. Since its rediscovery more than 
a century ago, the Voynich manuscript has 
been puzzled over by experts ranging from 
leading US military cryptographer William 
Friedman to cautious (and incautious) 
humanities scholars. Since 1969, it has been 
stored in Yale University’s Beinecke Rare 
Bookand Manuscript Library in New Haven. 
The fine calligraphy of the 234-page 
‘MS 408; apparently alphabetic, has never 
been decoded. Copious illustrations of bath- 
ing women, semi-recognizable plants and 
apparent star maps remain undeciphered. 
No one knows who created it or where, and 
there is no reliable history of ownership. Its 
parchment was radiocarbon-dated in 2009 
to between 1404 and 1438, with 95% prob- 
ability. The manuscript could still be a forgery 
using medieval parchment, but most experts, 
including Yale's, are convinced it is genuine. 
Now The Voynich Manuscript, a volume 
edited by the library’s curator, Raymond 
Clemens, revivifies this tantalizing artefact. 
A handsome facsimile is accompanied by an 
introduction by historian of science Deborah 
Harkness and six up-to-date essays by con- 
servators, historians and literary scholars. As 
Harkness remarks, the collection does not 


I: a Connecticut archive sits a manuscript 


attempt a definitive = —— 
solution ofthe conun- | 
drums raised by the | 


manuscript. Instead, AN 
the contributors “invite Bm ue 
the reader tojoinusat | = 
the heart of the mys- i 
tery as we strive to tA 
better understand this a 4 
complex book and its The Voynich 
history”. Manuscript 

The manuscript is EDITED BY RAYMOND 
named after Wilfrid COMENS | 

Yale University Press: 


Voynich, a Polish-born 
revolutionary who 
escaped Siberian exile to become a dealer in 
rare books and manuscripts in London and 
then New York City. In 1912, under condi- 
tion of absolute secrecy, Voynich bought 
the manuscript from a Jesuit archive in Italy 
that he never identified, but that was selling 
part of its collection to the Vatican Library. 
Up to his death in 1930, Voynich marketed 
it as the work of thirteenth-century English 
scientist and friar Roger Bacon — a theory 
discredited only by the radiocarbon dating. 
In 1961, the manuscript was purchased by 
book dealer Hans Kraus, who presented it to 
Yale; the university put it online in 2004 (see 
go.nature.com/2dlns1c). Today, it attracts 
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16% of traffic to the Beinecke digital library, 
and for close to half of all traffic to the zoom 
viewer that allows examination of indi- 
vidual pages of manuscripts. When the late 
semiotician and novelist Umberto Eco visited 
Yale in 2013, MS 408 was the only manuscript 
he asked to see, notes Clemens. 

The high-quality colour facsimile makes 
up most of the book. Each page, including 
foldouts, is reproduced at almost its original 
size (around 23 x 16 centimetres). Among 
the essays, researcher René Zandbergen cov- 
ers the disputed history of the manuscript's 
ownership from the fifteenth century to 1912. 
Manuscript curator Arnold Hunt looks at 
Voynich’s respected, but not always scru- 
pulous, career. A group of six conservators 
details the forensic investigation of the parch- 
ment, ink and binding. And historian Jennifer 
Rampling probes the relationship of the illus- 
trations to those in alchemical manuscripts, 
finding “no clear parallel” in alchemical writ- 
ing to the predominance of female bathers. 

The story of the various failed attempts 
to decipher the script, told by Clemens and 
Renaissance scholar William Sherman, is 
particularly fascinating. It begins in the 1920s, 
when US philosopher William Newbold 
convinced himself that the text was mean- 
ingless, but that each letter concealed an 
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ancient Greek shorthand readable under 
magnification. He further claimed that 
this ‘finding’ proved the authorship of 
Bacon, who he claimed had invented a 
microscope centuries before Antonie van 
Leeuwenhoek. After Newbold’s death, the 
‘shorthand’ was revealed to be random 
cracks left by drying ink. 

Wisely, little space is devoted to the 
many speculative theories of origin and 
meaning. The manuscript has been cast, 
for instance, as a Middle High German 
hygiene manual written in ‘mirror writ- 
ing’ — the technique used by Leonardo 
da Vinci — and as a herbal manuscript in 
the Aztec language Nahuatl. (Readers with 
a taste for these can consult The Voynich 
Manuscript (Orion, 2004), a study by 
Gerry Kennedy and Rob Churchill.) And 
no new decipherment is offered. 

Some idea of the complexity of the 
story is shown by a letter in Latin that 
Voynich apparently found affixed to 
the manuscript. This is reproduced in 
the volume, oddly without translation. 
Dated 1665, it was written by Johannes 
Marcus Marci (physician to the Holy 
Roman Emperors) and addressed to 
his former tutor, the Jesuit Athanasius 
Kircher. (The foremost polymath of the 
age, Kircher was wrongly credited with 
deciphering Egyptian hieroglyphs.) In 
the letter, Marci notes that he is sending 
Kircher the entire manuscript, asks him 
to decode it and mentions the claim that 
Bacon authored it. We know that Kircher 
received the manuscript, but made no 
progress with it. After his death in 1680, it 
disappeared into Jesuit archives in Rome 
until it came into Voynich’s hands. 

What hope is there of decoding the 
script? Not much at present, I fear. The 
Voynich manuscript reminds me of 
another uncracked script, on the Phaistos 
disc from Minoan Crete, discovered in 
1908. The manuscript offers much more 
text to analyse than does the disc, but in 
each case there is only one sample to work 
with, and no reliable clue as to the under- 
lying language — no equivalent of the 
Rosetta Stone (A. Robinson Nature 483, 
27-28; 2012). Professional cryptographers 
have been rightly wary of the Voynich 
manuscript ever since the disastrous self- 
delusion of Newbold. But inevitably, many 
sleuths will continue to attack the problem 
from various angles, aided by this excellent 
facsimile. Wide margins are deliberately 
provided for readers’ notes on their own 
ideas. “Bonne chance!” writes Clemens. I'll 
second that. m 


Andrew Robinson’ many books include 
The Man Who Deciphered Linear B and 
Lost Languages. 

e-mail: andrew@andrew-robinson.org 
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Obaysch the hippo was captured in 1849 and sent to London Zoo, where he became a sensation. 


Animal crackers 


Henry Nicholls relishes a brace of chronicles on how 
zoos on both sides of the Atlantic came to be. 


In 1825, little more than six months 

after returning to London from the 
East Indies, hed put together a prospectus 
that would result in the creation of the Zoo- 
logical Society of London (ZSL). With his 
career as an entrepreneur-cum-statesman in 
Penang, Java, Sumatra and Singapore behind 
him, Raffles was ready to indulge his passion 
for natural history. 

The relationship between humans and 
the rest of the animal kingdom has always 
changed and will always change. But there can 
be few shifts as rapid and radical as those in 
the nineteenth century. With the age of sail in 
full swing and European docksides piled with 
boxes of specimens, a new class of profes- 
sional zoologist arose. The likes of Alexander 
von Humboldt, Charles Darwin and Alfred 
Russel Wallace began to make sense of the 


S tamford Raffles did not waste his time. 


The Zoo: The Wild and Wonderful Tale of the 
Founding of London Zoo 

ISOBEL CHARMAN 

Viking: 2016. 


The Animal Game: Searching for Wildness at 
the American Zoo 

DANIEL E. BENDER 

Harvard University Press: 2016. 


astounding variety of animal life. The period 
covered in Isobel Charman’s The Zoo, 1824- 
51, saw much of the transformative action. 
Meanwhile, historian Daniel Bender’s The 
Animal Game chronicles the evolution of the 
US zoo from the 1870s to the 1970s. 
Charman has hit on a delightful structure 
for her “wild and wonderful tale”. Each chap- 
ter is a leg in a relay. So Raffles hands over 
to Decimus Burton, the ambitious twenty- 
something architect who began to shape > 
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the zoological gardens in Regent's Park in 
1827. Veterinary surgeon Charles Spooner 
is next, his struggle to keep the animals alive 
in the 1830s foreshadowing concerns over 
animal welfare. On the upside, the deceased 
creatures were a bonus for chief animal pre- 
server John Gould and the ZSLU’s museum in 
Mayfair, a collection subsumed into the Brit- 
ish Museum in 1855. Gould pops up again 
later with a selection of Galapagos finches, 
which helped Darwin to develop the case for 
evolution by natural selection. Thus Char- 
man takes the story out of the cages and 
onto the smoggy, sometimes riotous streets 
of Victorian London, up and down the coun- 
try and beyond its shores. 

With an imperial network of travellers 
and traders, customs officials and collec- 
tors, dealers and diplomats all sending speci- 
mens to London, the nascent zoo boasted 
an impressive diversity of animals from the 
off. During its first decades, the zoological 
gardens remained relatively exclusive, as its 
fellows mulled over the peculiar physiology, 
morphology and behaviour of inmates such 
as a chimpanzee that they named Tommy. 
The public, meanwhile, indulged their curi- 
osity at venerable but unashamedly unscien- 
tific menageries such as the Exeter Exchange 
on London’s the Strand. Soon, Raffles’ 
original vision, emphasizing the animals as 
objects of scientific research, became hard 
to sustain. The ZSL gave up its farm outside 
Kingston upon Thames — a site for breeding 
and experiments — and opened the gates of 
the zoological gardens in the 1840s. It also 
began to bring in crowd-pleasers such as 
giraffes and hippopotamuses. 

What is perhaps most striking about The 
Zoo is its style. “He 
could see it now, 
in his mind’s eye,” 

Charman writes as 

Raffles’ plans take 

shape. “All of these 

beasts, from across 

the earth, from 

every corner of the 

endless Empire, 

gathered right here! He could almost hear 
the roars.” It is clear that a huge amount of 
research has gone into this work, but much 
is lost among imagined thoughts and feel- 
ings. In writing non-fiction novelistically, 
Charman fails to take full advantage of the 
strengths of either genre. 

Bender takes a more conservative 
approach in The Animal Game. Given the 
imperial premise of a robust zoological 
collection, it makes perfect sense that the 
appearance of US zoos should map onto the 
emergence of the nation as a global power 
towards the end of the nineteenth century. 
The Philadelphia Zoological Society — 
the country’s oldest — appeared in 1859, 
although it achieved a permanent collection 


William Mann, fifth director of the National Zoo in Washington DC, travelled widely to collect animals. 


(including an Asian elephant, Jennie) only 
some 15 years later. Others included Lin- 
coln Park Zoo (Chicago, Illinois, in 1868), 
the National Zoo (Washington DC, 1889) 
and the Bronx Zoological Park in New York 
City (1899). In many ways, the stories of 
these institutions mirror the ZSL’s. As with 
Raffles, “those urban elites who dreamed of 
zoological parks fantasized that displays of 
biological order would beget social order’, 
writes Bender: nature’s hierarchy offered a 
model to the masses. Also like the ZSL, US 
zoological societies struggled to distinguish 
themselves from low-brow entertainment — 
menageries, circuses and world fairs. 

Whereas the animal trade is implicit in 
The Zoo, Bender's book renders it explicit 
through characters such as Frank Buck — 
animal dealer, showman and film star — and 
William Mann, the pith-helmeted director of 
the National Zoo from 1925 to 1956. In 1937, 
Mann travelled to the Far East to secure tigers 
(at US$100 a pair), a Sumatran gibbon, casso- 
waries, orang-utans and more. In Africa, he 
had to manage his own collecting, on one 
occasion employing 500 locals in a failed 
attempt to net a giraffe. When he sailed for the 
United States, he had a cargo of 1,500 animals, 
but many were dying: a crate of 14 pythons 
was summarily thrown overboard. 

During the Great 
Depression, bank- 
rolled by federal cof- 
fers, many zoos began 
to reconfigure with 
new displays such as 


For more on science 
in culture see: 
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“monkey islands” surrounded by moats. 
Captive breeding eventually appeared, as 
much by necessity as design: by the 1960s, 
centuries of exploitation had devastated wild 
animal populations. The only way that zoos 
could survive was to put more effort into 
breeding their own animals. 

Bender zips back and forth from institu- 
tion to institution, on collecting trips with 
traders, and with visitors feeding the animals 
snacks (and on occasion, broken bottles 
and roofing nails). He draws on rich archi- 
val material, including the thoughts of key 
players such as zoologist William Temple 
Hornaday, memos from management and 
a flurry of clippings from the popular press. 
In the century that he covers, US zoos faced 
challenges from economic ups and downs 
to rogue animals, disgruntled staff and a 
demanding public. 

What emerges is a story of adaptation and 
survival that exposes the modern zoo as “a 
third nature’, a “product of how we built, 
lived, and contested empires. It is wildness 
and wilderness suspended at the moment 
of their initial enclosure when there were 
still plenty of animals for the taking.” Those 
who are ethically opposed to zoos will find 
plenty here to strengthen their case. But with 
zoos power of reinvention, it seems likely 
that this “third nature” will be with us for 
some time. m 


Henry Nicholls is a journalist based in 
London. His latest book is The Galapagos. 
e-mail: henry@henrynicholls.com 
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Norway wolf cull will 
hit genetic diversity 


Norway's regional management 
authorities have approved plans 
to cull up to 70% of its grey wolves 
(Canis lupus). Because the current 
population consists of just 65-68 
individuals, of which only 21 are 
thought to be sexually mature 
(see go.nature.com/2euyixy), this 
cull and the subsequent loss of 
genetic material would seem to be 
a significant misstep for ensuring 
the wolf’s persistence in Norway. 
Preserving genetic variation is 
crucial for the long-term survival 
of populations in fragmented 
landscapes. The genetic diversity 
of these wolves is already low 
because the entire population 
is descended from just a few 
individuals (C. Vila et al. Proc. 
Biol. Sci. 270, 91-97; 2003). 
Their small population size and 
founding history mean that 
they need genetic input from 
immigrant wolves from Sweden, 
Finland and Russia to survive. 
Otherwise, they face the same 
fate as the well-studied Isle Royale 
wolf population in the United 
States (see Nature 520, 415; 2015). 
Livestock protection is the 
Norwegian government's main 
justification for the cull, although 
less than 9% of Norway’s sheep 
are taken by wolves (data from 
go.nature.com/2eemukj). The 
government is ignoring well- 
established scientific practices for 
managing a critically threatened 
species, as well as overriding 
its commitment to the Bern 
Convention, which lists the wolf 
as a strictly protected species. 
Elina Immonen Uppsala 
University, Sweden. 
Arild Husby University of 
Helsinki, Finland. 
arild.husby@helsinki.fi 


Inequality: need for 
data on all nations 


Inequality studies need to be 
more representative — the 
countries that suffer the most 
from inequality are also those 
that we know the least about (see 


Nature 537, 466-470; 2016). 
Between 1994 and 2013, North 
America and western Europe 
together accounted for 80% of 
all publications on inequalities 
and social justice (ISSC, IDS and 
UNESCO World Social Science 
Report; UNESCO, 2016). The 
number of publications and data 
on these themes are very low from 
sub-Saharan Africa, south and 
east Asia, former eastern-bloc 
countries and the Arab world. 
This imbalance in our knowledge 
about inequality diminishes our 
global capacity to address it. 
Mathieu Denis International 
Social Science Council, Paris, 
France. 
Melissa Leach Institute of 
Development Studies, University 
of Sussex, Brighton, UK. 
mathieu@worldsocialscience.org 


Inequality: span the 
global divide 


National initiatives need to 
correct injustices related to class, 
inequality and salaries among 
scientists (see Nature 537, 
466-470; 2016). However, such 
measures may serve to reinforce 
the global north-south divide in 
research if, perhaps inevitably, 
they are more prevalent in 
higher-income countries. 

The domination of the scientific 
agenda and literature by northern 
over southern researchers has 
serious implications for how 
science is designed and produced, 
undermining its salience, 
credibility and legitimacy — and 
therefore its influence on policy 
development and implementation 
(see D. W. Cash et al. Proc. Natl 
Acad. Sci. USA 100, 8086-8091; 
2003). 

We suggest that the world’s 
many transboundary issues — 
such as climate change, poverty, 
human migration, public health 
and biodiversity decline — call 
for a more comprehensive, 
global approach. This should 
span the north-south divide by 
addressing the underlying issues 
and their consequences (see, 
for example, M. Blicharska et al. 


Nature Clim. Change; in the press). 
Richard J. Smithers Ricardo 
Energy & Environment, Harwell, 
UK. 

Malgorzata Blicharska Uppsala 
University, Uppsala, Sweden. 

José Maria Gutiérrez University 
of Costa Rica, San José, Costa Rica. 
richard.smithers@ricardo.com 


Protect aquaculture 
from ship pathogens 


Aquaculture is the world’s fastest- 
growing food-production sector 
and a crucial contributor to the 
United Nations’ Sustainable 
Development Goals. As a group 
of scientists, ocean-policy experts, 
aquaculture professionals and 
technical consultants from 
international organizations, 

we argue that, despite recent 
legislation, fish farms may still be 
at risk from pathogens in ballast 
water discharged from ships. 

The International Convention 
for the Control and Management 
of Ships’ Ballast Water 
and Sediments (go.nature. 
com/2evuskh) will come into 
force in September 2017. This 
will reduce the risks of transfer 
of organisms larger than 
10 micrometres and of bacteria 
that are harmful to humans, 
including Vibrio cholerae, 
Escherichia coli and Enterococcus 
species. However, the convention 
does not mention any other 
aquatic bacteria or viruses that 
could cause epidemics in the 
US$160-billion aquaculture 
industry and threaten food 
security. 

We suggest using a 
combination of molecular tools, 
experimental investigation, 
monitoring data and operational 
models to evaluate the risks 
and possible impacts on local 
aquaculture farms of how, when 
and where ships discharge ballast 
water. The findings should be 
presented to UN-OCEANS 
(www.unoceans.org) and the 
Joint Group of Experts on the 
Scientific Aspects of Marine 
Environmental Protection 
(www.gesamp.org) to support 


existing regulatory frameworks. 
Guillaume Drillet* DHI, 
Singapore. 

gdr@dhigroup.com 

*On behalf of 11 correspondents (see 
go.nature.com/2en2btv for full list). 


Small data call 
for big ideas 


Despite the meteoric rise in 
big-data approaches, funders also 
need to recognize that some of 
the most pressing problems must 
instead rely on the intelligent use 
of small data sets. 

Value-of-information analysis 
can evaluate whether big- 
data collection is worthwhile 
(R. Schlaifer and H. Raiffa 
Applied Statistical Decision 
Theory; Clinton, 1961). 
Collecting big monitoring data 
for threatened or invasive species, 
for instance, risks delaying 
decisions on protective measures. 
It might be better to fund direct 
action, as for killer whales in the 
Georgia basin (E. McDonald- 
Madden et al. Trends Ecol. Evol. 
25, 547-550; 2010). 

Urgent decisions may be 
necessary when information is 
sparse. In agricultural systems, 

a fast response to a new pest or 
disease outbreak can make the 
difference between eradication 
and decades of costly quarantine 
programmes. In ecology, 
population sizes and detectability 
are often too low to create big data 
sets. In health, defence and social 
sciences, collecting big data risks 
violating human ethics. 

Where data are limited, 
scientific solutions underpinned 
by strategies such as adaptive 
management can optimize 
decision making (I. Chadeés et al. 
Theor. Ecol. http://doi.org/br9s; 
2016). Artificial intelligence, for 
example, can inform adaptation 
strategies for sea-level rise to 
protect migratory shorebirds 
worldwide (S. Nicol et al. Proc. 

R. Soc. B 282, 20142984; 2015). 
Iadine Chadés, Sam Nicol 
CSIRO, Dutton Park, 
Queensland, Australia. 

iadine. chades@csiro.au 
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BRIEF COMMUNICATIONS ARISING 


The intrinsic thermal conductivity of SnSe 


ARISING FROM L.-D. Zhao et al. Nature 508, 373-377 (2014); doi:10.1038/nature13184 


Several groups have been unable to reproduce the record high thermo- 
electric figure of merit ZT of SnSe reported in ref. 1. Zhao et al.' meas- 
ured an ultralow thermal conductivity (<0.4W m~!K~1 at 923K) and 
consequently record high values of ZT ~ 2.6 + 0.3 and ZT % 2.3 + 0.3 
at 923K along the b and c directions, respectively, in their single- 
crystalline SnSe. However, after careful analysis of the data of ref. 1, we 
deduce that their samples are not fully dense and thus not truly single 
crystalline, implying that their reported thermal conductivities are 
not intrinsic to SnSe. This warrants further investigation into intrinsic 
thermal transport in SnSe single crystals and its use as a thermoelectric 
material. There is a Reply to this Comment by Zhao, L.-D. et al. Nature 
539, http://dx.doi.org/10.1038/nature 19833 (2016). 

In ref. 1, the total thermal conductivity « of single-crystalline SnSe 
was calculated using the relation «= DC,p, where D is the thermal 
diffusivity, C, is the specific heat capacity at constant pressure, and 
pis the density. Although the authors did not list the density of their 
SnSe crystals, they did provide the diffusivity, specific heat and total 
thermal conductivity data along the three major crystallographic direc- 
tions (source data for figure 2d, and extended data figure 6a and b, of 
ref. 1). We extracted the p value for their crystals using the above relation, 
and found their single-crystalline SnSe samples to be of much lower 
density (around 5.43 g cm~*) when compared to the theoretical 
density pi, reported in the literature (Table 1). In their Reply to this 
Brief Communication Arising, Zhao et al. confirmed that the experi- 
mentally measured density of their samples was indeed lower than pn. 
The p value of single-crystalline SnSe estimated from the powder 
X-ray diffraction studies were in the range 6.13-6.18 g cm~? (Joint 
Committee on Powder Diffraction Standards (JCPDS) card numbers 
01-089-232, 01-089-233 and 01-089-235)7, while neutron diffraction 
studies? reported a p value of about 6.18 g cm~? (JCPDS card num- 
ber 01-071-3877). In addition, electron diffraction studies reported a 
p value of 6.07 g cm~3 (JCPDS card number 01-075-2123)*. In other 
words, the p value for the SnSe samples used in ref. 1 was 88% to 90% 
of the theoretical density pn. 

Although the exact cause of the low p value in their SnSe sam- 
ples is beyond our conjecture, in Supplementary Table 1 we com- 
pare the reported density and the thermal conductivity values given 
by several groups for SnSe at 300 K and 750K, measured along 
different pressing directions or crystallographic axes. As can be 
deduced from Supplementary Table 1 and the corresponding scatter 
plot (Fig. 1), the « values and the corresponding densities of fully 
dense single and polycrystalline SnSe reported by different groups 
are consistently higher than those reported in ref. 1. In the case of 
polycrystalline SnSe, additional phonon scattering mechanisms are 
present, which should lead to even lower lattice thermal conductivity 
Ky () compared to the corresponding thermal conductivity of 
single-crystalline SnSe. Although Zhao et al., in their Reply to this 


Table 1 | Room-temperature characteristics of the SnSe samples of ref. 1 


At about 300K a axis baxis c axis 
Thermal conductivity, « (Wm-! K~!) 0.46525 0.70014 0.67560 
Diffusivity, D (mm? s~+) 0.33983 0.51139 0.49347 
Specific heat, Cp J g-? K~4) 0.252 0.252 0.252 
Density, p (g cm~3) 5.43 5.43 5.43 


The densities of the samples of ref. 1 were deduced from the relation «=DCpp. 
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Figure 1 | A scatter plot of thermal conductivity versus density of 
single and polycrystalline SnSe. The values are taken along different 
crystallographic axes and along parallel or perpendicular pressing 
directions, at 300 K and about 760K, derived from the peer-reviewed 
literature cited in Supplementary Table 1. The open and solid symbols 
represent « at 300 K and 760K, respectively. The « values of single- 
crystalline SnSe along the a axis are represented by red circles and 

along the b and c axes are represented by blue squares. The « values 

of polycrystalline SnSe along the parallel and perpendicular pressing 
directions are represented by green triangles and violet inverted triangles, 
respectively. The red ellipse encompasses data from ref. 1 and data 
reported by other groups® *! lie within the blue ellipse. A detailed list of all 
the data points is available in Supplementary Table 1. 


Brief Communication Arising, suggested the formation of SnO as a 
plausible cause for the higher thermal conductivity in polycrystalline 
SnSe, it must be noted that SnSe is not highly sensitive to air at ambient 
temperature’. 

It should also be noted that Zhao et al.' suggested the presence of 
“plenty of vacancies and interstitials” in their samples. Some defects are 
entropically present in single crystals, but we doubt that the vacancies 
and interstitials can account for the observed 11%-12% deficiency in 
density at room temperature. It is well known that porosity signifi- 
cantly reduces the thermal transport, which emphasizes the impor- 
tance of reporting packing density values in future publications to 
validate the intrinsic transport properties. In other words, true thermal 
conductivity cannot be obtained by simple normalization, because 
thermal diffusivity and density are interdependent quantities. Here, 
we are not expressing concerns about the measurement techniques 
or the self-consistency of measurements by Zhao et al.' but about the 
intrinsic nature of their SnSe single crystals. 

Thermoelectricians have long aimed to optimize ZT by reducing 
«, and ref. 1 reported exceptionally low thermal conductivity. Thus, 
the sole aim of this Comment is to correct the scientific record by 
stating that the ultralow « value reported in ref. 1 is not intrinsic 
to fully dense single-crystalline SnSe. A single crystal, by definition, 
must have an experimentally measured density that is close to 100% 
of the theoretical density. Thus, the SnSe samples of ref. 1 cannot 
be classified as single crystalline and the thermal conductivity and 
figure of merit values for SnSe presented in ref. 1 are not intrinsic to 
single-crystalline SnSe. 
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Zhao et al. reply 


REPLYING TO P.-C. Wei et a/. Nature 539, http://dx.doi.org/10.1038/ 
nature19832 (2016) 


In the accompanying Comment, Wei et al.! point out that the sample 
density used to obtain the thermal conductivity value of SnSe crystals 
is about 10% lower than the theoretical value p,, and as a result the 
thermal conductivity is underestimated by about 10%. The data 
published in ref. 2 are not based on a single measurement or on a single 
specimen. In the supplementary information of ref. 2 we reported 
measurements for at least seven crystals with good reproducibility. 
The material is not a conventional crystal and has several unusual 
and confusing features, such as the crystallographic phase transition 
(causing numerous microcracks) and a tendency to oxidation®*. 
The SnSe crystal also has strong anisotropy and weak mechanical 
properties, leading to facile cleavage so that the crystal can easily 
be damaged during cutting and handling. In addition, challenging 
measurements of this type (that is, thermoelectric measurements on 
cut single crystals) can show a statistical spread and the error in the 
measurement of thermal conductivity is typically between 15% and 
20% (ref. 7). 

The sample density we reported is the value that we measured. 
Differences between our value and the value(s) measured by others 
could be caused by differences in sample characteristics arising from 
preparation methods and handling, or measurement errors. Regardless 
of whether the density of the SnSe crystal is 6.0g cm~? or 5.5g cm, the 
lattice thermal conductivity is ultralow, in the ranges 0.6-0.8 W mK~! 


E2 | NATURE | VOL 539 | 3 NOVEMBER 2016 


ee 


11. Sassi, S. et al. Assessment of the thermoelectric performance of polycrystalline 
p-type SnSe. Appl. Phys. Lett. 104, 212105 (2014). 

12. Zhang, Q. et al. Studies on thermoelectric properties of n-type polycrystalline 
SnSe1-xS, by iodine doping. Adv. Energy Mater. 5, 1500360 (2015). 

13. Wei, T.-R. et al. Thermoelectric transport properties of pristine and Na-doped 
SnSe;.,Te, polycrystals. Phys. Chem. Chem. Phys. 17, 30102-30109 (2015). 
14. Li, Y., Shi, X., Ren, D., Chen, J. & Chen, L. Investigation of the anisotropic 
hermoelectric properties of oriented polycrystalline SnSe. Energies 8, 
6275-6285 (2015). 

15. Leng, H.-Q., Zhou, M., Zhao, J., Han, Y.-M. & Li, L.-F. The thermoelectric 
performance of anisotropic SnSe doped with Na. RSC Adv. 6, 9112-9116 
(2016). 
16. Leng, H., Zhou, M., Zhao, J., Han, Y. & Li, L. Optimization of thermoelectric 
performance of anisotropic Ag, Sni-x Se compounds. J. Electron. Mater. 45, 
527-534 (2016). 
17. Han, Y.-M. et al. Thermoelectric performance of SnS and SnS-SnSe solid 
solution. J. Mater. Chem. A 3, 4555-4559 (2015). 

18. Serrano-Sanchez, F. et al. Record Seebeck coefficient and extremely low 
hermal conductivity in nanostructured SnSe. Appl. Phys. Lett. 106, 083902 
(2015). 

19. Bera, T., Sanchela, A. V., Tomy, C. V. & Thakur, A. D. n-type SnSe;_., for 
hermoelectric application. Preprint at https://arxiv.org/abs/1601.00753 
(2016). 

20. Chere, E. K. et al. Studies on thermoelectric figure of merit of Na-doped p-type 
polycrystalline SnSe. J. Mater. Chem. A 4, 1848-1854 (2016). 

21. Ortiz, B. R. et al. Effect of extended strain fields on point defect phonon 
scattering in thermoelectric materials. Phys. Chem. Chem. Phys. 17, 
19410-19423 (2015). 


Supplementary Information is available in the online version of the paper. 


Author Contributions P.-C.W. and Y.Y.C. contributed towards the synthesis of 
fully dense SnSe single crystals and the thermoelectric measurements, which 
were verified by S.B., J.H., R.P., A.M.R. and S.N. This combined study helped 
us to determine that packing density has a substantial effect on the figure of 
merit of SnSe. 


Competing Financial Interests Declared none. 


doi: 10.1038/nature19832 


at room temperature and 0.3-0.4W mK! at 800K. In hole-doped 
SnSe crystals”®, where a different experimental density was measured 
(closer to the theoretical 6.0 g cm~%), the lattice thermal conductivity 
is in agreement with the value we report in ref. 2. If the experimental 
density is adjusted 10% upwards to be closer to the theoretical value of 
6.1g cm, the thermal conductivity is still ultralow. Therefore, sample 
density does not greatly affect the ultralow value of the lattice thermal 
conductivity. Whether the thermal conductivity is intrinsic to SnSe is 
not yet clear, but some explanations have recently been proposed>». 

The sample prepared by Wei et al. and described in their Comment 
was grown under different conditions, the details of which are not 
known to us. Must an ‘intrinsic’ value refer to a material with exactly 
stoichiometric composition, vacancy-free, dislocation-free, strain- 
free, fully dense, and with the number of excited electrons equal to the 
number of holes? Or could an ‘intrinsic’ value refer to the most stable 
state of a material for a given set of experimental growth conditions? 
The SnSe sample of a recent publication has an Sn vacancy rate of 
several per cent’, which is consistent with our observations of a large 
off-stoichiometry of Sn:Se (0.835:1) in SnSe crystals prepared using 
the Bridgman method; however, no off-stoichiometry was observed 
in SnSe crystals synthesized using a vapour-transfer method. Thus we 
consider it unfair to conclude, as do Wei et al. in their Comment, that 
our value for lattice thermal conductivity is not ‘intrinsic to SnSe, on 
the basis of the sample density only. 

The true measure of the thermal conductivity lies in the thermal 
diffusivity data, which show very low values. The question of 
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interest concerning SnSe is whether or not it has an ultralow thermal 
conductivity, and most reported values for samples of SnSe suggest 
that it does**!""!*. The reported values have a spread’, but this is not 
surprising given (1) the challenges in measuring materials with ultralow 
thermal conductivity, (2) variations from sample to sample with respect 
to defects, phase-transition cracks, and so on, and (3) several variations 
in the synthesis of the material°. Of all reports on polycrystalline SnSe 


samples 


2-3,78:10-36 at room temperature, 55% report higher thermal con- 


ductivity than in the single crystal”, 25% report similar values, and 20% 
report a lower thermal conductivity. At 800K, about 15% report higher 
thermal conductivity than in the single crystal”, 80% report similar 
values, and 5% report a lower thermal conductivity. 
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Stones that could cause ripples 


Monkeys have been observed pounding stones and unintentionally forming sharp-edged, tool-like fragments. This 
deliberate breakage raises questions about the evolution of intentional stone modification. SEE LETTER P.85 


HELENE ROCHE 


f the non-human primates whose 
() manipulative skills and tool use have 

been studied, capuchin monkeys 
(Sapajus libidinosus) seem to be the most 
Although it has been known for more than 
25 years that capuchins use tools’, a key devel- 
opment in our understanding of their skills 
was the discovery’ that, like chimpanzees, 
capuchins use stone tools for tasks such as 
cracking open nuts. On page 85, Proffitt et al.’ 
report an interesting stone-breaking behav- 
iour observed among wild capuchins of Serra 
da Capivara National Park in eastern Brazil. 

The authors observed capuchins deliberately 
damaging stones by using a hand-held 
quartzite cobble stone to violently hammer 
other quartzite cobbles embedded in a large 
stone conglomerate structure (Fig. 1). The 
stone-smashing activity (captured on video 
by Proffitt and colleagues) causes several 
types of stone modification. For example, 
the action can leave specific impact marks on 
the two cobbles, or the hand-held hammer- 
stone can fracture, creating chunky pieces or 
thinner detachments of sharp-edged stone 
flakes. The small concentrations of modified, 
broken and flaked stone pieces (known as a 
lithic assemblage) created by the capuchins’ 
stone-breaking activity could potentially be 
mistaken for similar lithic assemblages made 
by early hominins. 

Proffitt et al. describe the capuchin- 
produced lithic assemblages as being created 
in an unintentional way, because they are a 
coincidental consequence of the rock smash- 
ing. The authors report that the capuchins have 
never been seen using, or even seemed to be 
interested in, the stone flakes they produce 
(see Supplementary Video 1 in Proffitt et al.). 
However, Proffitt et al. consider that their 
capuchin findings are relevant to early-human 
studies because the process of stone knapping 
(striking stones together to make a desired 
stone product) is no longer a hominin-specific 
activity. Previous evolutionary explanations 
for the origins of hominin intentional stone 
modifications have focused on hominin-spe- 
cific advances, such as changes in hand shape, 
coordination and cognitive skills. Explaining 
the origins of intentional stone modifications 


6.6% 


Figure 1 | Capuchin monkeys can deliberately break stones. Proffitt et al.’ observed wild capuchin 


monkeys smashing stones together in Serra da Capivara National Park in Brazil. This process creates 
stone fragments, including some that have sharp edges and could potentially be used as tools — although 
the monkeys have not been observed to show interest in using these fragments. 


by early hominins in light of Proffitt and 
colleagues’ interpretation of their results would 
therefore require alternative hypotheses. 

To understand the purpose ofa given stone- 
modification activity, a key factor to consider 
is intention, a concept in the domain of cog- 
nitive psychology that implies having a goal 
to reach, involving planning and forethought. 
Similar stone-breakage outcomes can occur 
through activities that are either unintentional 
for the capuchins or intentional in the case of 
early hominins. 

The stone-breaking action itself is a physical 
process subject to the permanent constraints 
of solid-state physics. Hard-rock conchoi- 
dal fracture, which produces flakes rather 
than shapeless chunks, is governed by the 


34 | NATURE | VOL 539 | 3 NOVEMBER 2016 
© 2016 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


mechanical laws of brittle-material fracture. 
Conchoidal fracture occurs when two stones 
are struck together at a tangential angle using 
a blow strength adapted to the stone density. 
Such fractures can occur naturally by chance. 
Animals can smash rocks together in the com- 
plex action required to produce conchoidally 
fractured fragments, regardless of whether or 
not the stone-smashing activity was carried 
out with the intention of stone modification. 
The ability of New World capuchins to 
unintentionally produce lithic assemblages 
should not make us suspicious about any 
archaeological stone assemblage considered 
to belong to the African Early Stone Age. Our 
knowledge about technical behaviours and 
stone knapping in early-human archaeological 


T. FALOTICOT 


sites has a solid foundation. It represents 
decades of continuous research in the African 
continent that has provided hundreds of lithic 
assemblages, dated from as early as 3.3 mil- 
lion years ago* and retrieved from sites that 
have a firmly established geological context. 
The African Early Stone Age archaeological 
record is based on a large body of evidence, 
including other contextual site data such as 
time-dating information and fossil analysis to 
consider together with the lithic assemblages. 
However, primate rock modification might 
bring new insight to the long-running archae- 
ological debate about the origins and nature of 
modified stones identified from the American 
continent during the Late Pleistocene epoch 
(between 40,000 and 20,000 years ago). 
What early hominins and capuchins are 
doing to the stones is a goal-oriented action, 
but not with the same intention. In Early Stone 
Age archaeology, even if the function of all 
hominin stone tools is not known, stone-strik- 
ing activities are mainly thought to be linked 
to intentional recurrent production of sharp- 
edged flakes. Other pounding activities pro- 
ducing less-distinctive stone artefacts might 
have occurred, but are less easy to demon- 
strate. Some stone tools from archaeological 
Early Stone Age sites have physical indications 


DNA REPAIR 


of functional tool usage. Stone tools are often 
associated with faunal remains, such as bones 
bearing defleshing cut-marks inflicted by a 
sharp-edged tool, or the tools can have impact 
marks indicating breakage caused by a pound- 
ing tool’®. 

The goal of the capuchins’ stone-pound- 
ing described by Proffitt and colleagues is 
unknown. The only obvious linked action 
is the monkeys’ systematic licking of the 
pounded quartzite stone, which occurs after 
almost every blow to the pounded stone 
surface. Discussing the possibility of ingestion 
of nutritional components in the powdered 
rocks, the authors mention the trace nutrient 
silicon. However, silicon’s essential role dur- 
ing connective-tissue formation and calcium 
deposition is at an early stage of bone calci- 
fication", yet one of the capuchins videoed 
in action by Proffitt and colleagues seems to 
be a young adult. Nevertheless, the benefit 
of ingesting powdered quartz is a possibility 
to consider. 

The same research group recently demon- 
strated’* that the capuchin monkeys of Serra da 
Capivara National Park have used quartz- 
ite cobbles to pound open nuts for at least 
600 or 700 years. The capuchins might also 
have been engaged in unintentional knapping 


Telomere-lengthening 
mechanism revealed 


Shortening of the ends of chromosomes limits a cell’s lifespan. Some cancer cells 
avoid this fate through a mechanism called alternative lengthening of telomeres, 
molecular details of which have now been defined. SEE ARTICLE P.54 


CAITLIN M. ROAKE & STEVEN E. ARTANDI 


uring cell division, the genome is 
D duplicated by DNA polymerase 

enzymes. However, each chromo- 
some’s ends are incompletely replicated during 
duplication, because DNA polymerases require 
an RNA primer 5’ to the region being syn- 
thesized. This means that the repetitive DNA 
sequences called telomeres that cap the ends of 
chromosomes shorten at each division, and this 
shortening limits the replicative lifespans of 
most cells. During cancer development, cells 
acquire the ability to divide indefinitely by 
circumventing telomere shortening — either 
by upregulating the enzyme telomerase, 
which extends telomeres, or by activating a 
mechanism termed alternative lengthening of 
telomeres (ALT), which is based on acommon 
method of DNA repair, homologous recom- 
bination. On page 54, Dilley et al.’ reveal a 


mechanism that underlies ALT and identify 
an unusual DNA polymerase that mediates 
this process. 

During homologous recombination, a 
double-strand break in the DNA of one chro- 
mosome is repaired by a DNA polymerase 
using template DNA that is taken from a 
matching sister chromatid — an identical 
DNA molecule generated during replication. 
Cancer cells that use ALT often show higher 
levels of DNA damage at telomeres than do 
non-ALT cells’, which may predispose them 
to use homologous recombination to repair 
breaks in telomeric DNA. In human ALT 
cancer-cell lines, evidence of enhanced homol- 
ogous recombination at telomeres includes an 
increase in telomere exchanges between sister 
chromatids compared to other cell lines’, and 
evidence for the copying of DNA tags from one 
chromosome end to another’. An estimated 
10-15% of tumours use ALT to maintain their 
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for an equivalent or even longer period of 
time. Investigating the antiquity of the stone- 
smashing behaviour or trying to determine the 
behaviour’s function and possible role in capu- 
chin evolution are some of the many promising 
fields of research rippling out from the shattering 
discovery by Proffitt and colleagues. m 
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telomeres, making this process an important 
target for cancer therapy”®. However, dis- 
section of the molecular mechanisms that 
underlie ALT has been challenging. 

There are two ways in which telomeres 
might use homologous recombination to 
maintain their length. In the first model, 
unequal exchange of DNA between sister telo- 
meres creates a longer and a shorter telomere, 
one of which is inherited by each daughter 
cell. The cells that inherit longer telomeres 
eventually outcompete those that have shorter 
telomeres. In the second model, which is 
increasingly gaining favour, telomeric DNA 
is synthesized using an existing piece of telo- 
meric template, either from another telomere 
or from free molecules of repetitive DNA 
called extrachromosomal telomeric DNA that 
are found in ALT cells’. 

Dilley et al. provide strong support for the 
second model. To encourage homologous 
recombination in ALT cells, the authors exploit 
a system they had previously engineered’ to 
create targeted double-strand breaks in telo- 
meres by fusing the Fokl nuclease enzyme, 
which cleaves DNA, to the telomere-binding 
protein TRF1. They observe a tenfold increase 
in telomeric DNA synthesis after TRF1- 
Fok1 induction in cells known to use ALT. 
Furthermore, they show that this synthesis 
is unidirectional and processive — capable of 
synthesizing long tracts of telomere repeats 
typically 20 kilobases long (the length of an 
average ALT telomere). The kinetics of this 
synthesis are consistent with those that could 
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Figure 1 | Break-induced telomere synthesis. Some cancer cells maintain the telomeric DNA sequences 
that cap the ends of their chromosomes through a mechanism dubbed alternative lengthening of telomeres 
(ALT). Dilley et al.’ outline a process called break-induced telomere synthesis, which underlies ALT. a, 
Double-stranded DNA breaks can arise in telomeres for several reasons, including telomeric shortening 
during cell division or changes to the way in which telomeric DNA is packaged as chromatin. b, In ALT cells, 
the protein complex RFC1-5 rapidly binds to double-strand breaks in telomeres. c, RCF1-5 recruits the 
protein PCNA and the DNA polymerase enzyme POLS to the break. This protein complex synthesizes long 
tracts of DNA using a complementary strand of DNA as a template from which to replicate the telomere. 


cause large fluctuations in telomere length, as 
is seen in ALT cells®. 

The characteristics and kinetics of this 
DNA synthesis match those of a phenomenon 
called break-induced replication, which is a 
telomere-maintenance mechanism in yeast 
strains that lack telomerase’. Break-induced 
replication is a form of homologous recom- 
bination that initiates DNA replication when 
only one end of a double-strand break shares 
sequence similarity with a template. Dilley 
et al. term this process in mammalian ALT 
cells break-induced telomere synthesis. 

The authors next set out to characterize the 
proteins responsible for break-induced telomere 
synthesis. The protein Rad51 has a key role in 
homologous recombination, and is required 
for break-induced replication in yeast”. But, 
surprisingly, Dilley and colleagues found that 
Rad51 was dispensable for break-induced telo- 
mere synthesis in ALT cells. Rather, a complex 
that consists of the polymerase POL6 and the 
proteins PCNA and RFC1-5 is found at sites 
of DNA damage in ALT cells and is required 
for break-induced telomere synthesis (Fig. 1). 
The authors theorize that this atypical complex 
is responsible for the dominant pathway of 
telomere synthesis in ALT cells. 

Although Dilley et al. shed light on the 
mechanisms underlying ALT in cancer cells, 
their findings also open up new questions. For 
instance, the authors demonstrated that they 
could trigger break-induced telomere synthe- 
sis in both ALT and telomerase-producing 
cells, so why is this method of telomere rep- 
lication not operative in most cancer cells? It 
is unclear what induces the ALT mechanism 
and how that mechanism is specifically sus- 
tained in the 10-15% of cancer cell types that 
use ALT. The authors provide one possible 
explanation — that ALT cells have higher 
rates of persistent telomere damage than other 
cancer cells. 

Alternatively, it might be that there is a 
change in the way in which telomeric DNA 


is packaged around histone proteins to form 
chromatin. Disruption of histone function 
has been shown” to induce ALT-like char- 
acteristics in cells, suggesting a mechanistic 
link between altered telomere histones and 
the ALT mechanism. Moreover, mutations 
in a chromatin-remodelling protein com- 
plex, ATRX-DAXxX, are highly recurrent 
in human ALT tumours’. The current 
work does not address how mutations in 
the ATRX-DAXX complex lead to ALT, but 
this will be an interesting avenue for further 
investigation. 

Dilley and colleagues’ link between 
break-induced telomere synthesis and 
ALT provides insights that might help us to 


FLUID DYNAMICS 


further understand how ALT is initiated and 
maintained in human cancer cells. In the 
future, a more in-depth understanding of these 
processes might lead to the development of 
therapies targeting human cancers that depend 
on ALT. m 
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Turbulence ina 
quantum gas 


The discovery of a cascade of sound waves across many wavelengths in an 
ultracold atomic gas advances our understanding of turbulence in fluids 
governed by quantum mechanics. SEE LETTER P.72 


BRIAN P. ANDERSON 


icroscopic droplets of ultracold 
atomic gases known as Bose- 
Einstein condensates might seem 


among the least-useful substances for studies of 
turbulence. These droplets typically contain 
about a billion times fewer atoms than are 
found in a single human cell, and exist at tem- 
peratures of a few billionths of a degree above 
absolute zero’. At these temperatures, the 
motion of the atoms is incredibly slow. Can 
turbulence really be induced, sustained and 
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probed in these ultracold atomic gases? Indeed 
they can, and such turbulence studies are 
gaining traction, linking atomic and quantum 
physics with classical fluid dynamics’. On 
page 72, Navon et al.’ report observations of a 
Bose-Einstein condensate as it is driven into a 
turbulent state and find evidence for a cascade of 
wave-like excitations, opening up new possibili- 
ties for exploring the universality of turbulence. 

Chief among the features that are shared by 
turbulent fluids is a characteristic distribution 
of kinetic energy between the components 
of the fluid that have different momenta. To 


picture such a distribution, imagine slowly 
adding cream to coffee and stirring so that 
the cream is quickly mixed in. The stirring 
drives energy into low-momentum (long- 
wavelength) flows, which might be temporar- 
ily visible as the cream traces large eddies in the 
coffee. The liquids are soon mixed and these 
flows are no longer easily identifiable. 

With continued stirring, more energy is 
injected into low-momentum flows, and the 
energy initially in these flows is transferred to 
higher-momentum flows because of the non- 
linear dynamics of the fluid. Eventually, energy 
from high-momentum flows is dissipated as 
heat because of the viscosity of the coffee. By 
continuously stirring the coffee, a cascade of 
excitations can be achieved: energy is injected 
into the system, transferred through the dif- 
ferent momentum components, and finally 
dissipated (Fig. 1). 

Identifying cascades of excitations is a cen- 
tral goal of turbulence studies. These cascades 
correspond to a power-law dependence ofa 
dynamic quantity such as energy on the wave- 
number, a quantity that is proportional to the 
magnitude of the momentum. Such signa- 
tures of turbulence have been experimentally 
confirmed in countless systems, including 
superfluid helium*” (liquid helium held at 
temperatures so low that it has zero viscosity). 
Navon and colleagues are the first to identify 
turbulent cascades of wave-like excitations in 
an ultracold atomic gas. 

The authors’ experiment uses a Bose- 
Einstein condensate (BEC) that consists 
of about 10° rubidium atoms. The atoms 
are trapped in a 3D cylindrical “box about 
30 micrometres long, with walls formed by 
laser light. This trap gives the BEC a uniform 
density, which ensures that the characteristics 
of the turbulent cascade are the same through- 
out the condensate. 

Navon et al. use a time-dependent magnetic 
field to shake the cloud of atoms, injecting 
energy into the low-momentum modes of 
the BEC. Rather than directly measuring the 
kinetic-energy spectrum, the authors deter- 
mine the momentum distribution: the fraction 
of atoms that have a value of the momentum 
within any given range. They find that, for 
short shaking times, the majority of the atoms 
are in low-momentum modes and few atoms 
are in high-momentum modes. 

After further shaking, interactions between 
the atoms, which lead to nonlinear dynam- 
ics of the BEC, push the atomic population 
into higher-momentum modes. Contin- 
ued shaking then replenishes the source of 
low-momentum excitations, and any atoms 
that populate high-momentum modes are 
lost from the trap. Finally, after a total shak- 
ing time of about 2 seconds, the authors find 
that a steady cascade of excitations has been 
established. This is revealed by the power- 
law dependence of the momentum distribu- 
tion on the wavenumber, and represents the 
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Figure 1 | A turbulent cascade. Navon et al.’ find evidence for a cascade of wave-like excitations in an 
ultracold atomic gas. This simple schematic illustrates how energy in low-momentum (long-wavelength) 
excitations cascades to high-momentum (short-wavelength) excitations. First, energy is injected into the 
system, manifesting as excitations of the low-momentum sound modes of the fluid. Because of the fluid’s 
nonlinear dynamics, energy is transferred to successively higher-momentum modes. Eventually, energy is 
dissipated as heat from high-momentum excitations (in a typical classical fluid), or high-momentum atoms 
are lost from the system (in the authors’ experiment). Navon and colleagues show that excitations covering a 
wide range of momenta are present simultaneously in their system — a characteristic feature of turbulence. 


primary result of the authors’ study. 

Navon and collaborators’ measuring 
technique involves first suddenly removing 
the box that confines the atoms, then allowing 
the atom cloud to expand freely and, finally, 
acquiring a 2D image of the cloud. For tur- 
bulent fluids, the atoms have enough kinetic 
energy that the interactions between them are 
comparatively weak. The images of the cloud 
therefore provide scaled representations of the 
momentum distributions that existed before 
the box was removed. 

The authors have expertly tackled one aspect 
of measuring turbulence in ultracold atomic 
gases, yet many problems remain to be solved 
before turbulence in these systems can be fully 
understood. For instance, few theoretical stud- 
ies have considered analytically the power-law 
dependence of the momentum distribution for 
a turbulent atomic fluid. The authors obtain a 
momentum distribution that is proportional to 
the wavenumber raised to the power of —3.5. 
This is close to the predicted value of —3 for 
weakly interacting waves in a turbulent com- 
pressible superfluid’. But the underlying physi- 
cal mechanisms that give rise to the difference 
between these values are not understood. 

It will also be helpful to extend the cascade 
over a range of wavenumbers that is larger 
than that achieved in this experiment: this 
would allow more flexibility in probing and 
understanding the nonlinear dynamics of 
turbulent cascades. Finally, the presence 
of vortices (localized regions of circulating 
fluid about a fluid-free core) in this experi- 
ment was inferred only through simulations. 
In general, the amount of energy involved 


in vortex excitations compared with sound 
excitations can greatly affect momentum or 
energy distributions’. Future measurements 
of vortex dynamics should help researchers to 
develop a better understanding of turbulence 
in quantum atomic fluids. 

Navon and colleagues’ experiment is a 
crucial step in the further establishment of 
trapped ultracold atomic gases as systems for 
studying turbulence. Of broader importance 
is the contribution of their work to the grow- 
ing set of techniques for experimentally and 
theoretically probing turbulence in fluids 
whose dynamics are governed by quantum 
mechanics. With substantial theoretical chal- 
lenges to overcome, the discovery of previ- 
ously unknown links between turbulence and 
quantum mechanics is one of the most exciting 
prospective outcomes of these studies. m 
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Phosphate on, 


rubbish out 


A previously unknown way in which cells mark proteins for destruction has been 
found in bacteria — phosphorylation of the amino acid arginine targets proteins 
for degradation by protease enzymes. SEE ARTICLE P.48 


ARTI TRIPATHI & SUSAN GOTTESMAN 


acteria are expert at adapting to changing 
lifestyles. Dealing with damaged or 


misfolded proteins is a key challenge 
during adaptation to stressful conditions 
such as high temperature. In such situations, 
chaperone proteins restore unfolded proteins 
to their functional, folded states and protease 
enzymes degrade damaged proteins’. But how 
are proteins that have been damaged beyond 
redemption recognized and sent for degrada- 
tion? On page 48, Trentini et al.” show that a 
surprisingly simple degradation tag — phos- 
phorylation of the amino acid arginine — has 
a central role in the handling of heat-damaged 
proteins in the bacterium Bacillus subtilis. 
Protein degradation in eukaryotes and 
prokaryotes (organisms with or without a 
cellular nucleus, respectively) is carried out by 
protease-enzyme complexes’, which require 
the nucleotide ATP for their activity. Access 
to the protease must be controlled to ensure 
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that only damaged or unwanted proteins 
are degraded. The protease active site is in a 
chamber that is accessible only to proteins 
that have been unfolded and moved there by 
the ATPase-enzyme portion of the protease- 
enzyme complex (Fig. 1). Thus, binding and 
unfolding of proteins by the ATPase governs 
degradation selectivity. 

The selection process for protein degra- 
dation often depends on the attachment of 
molecular ‘tags’ to the protein to be degraded. 
For example, eukaryotes might attach the 
polypeptide ubiquitin to proteins targeted 
for degradation by proteases’. Tagging is also 
found in prokaryotic archaea and mycobac- 
teria’, which use a non-ubiquitin polypeptide 
tag (Fig. 1). 

However, most bacteria select proteins for 
degradation using other mechanisms than tag- 
ging. One such mechanism is the exposure of 
an amino-acid motif known as a degron’, an 
intrinsic part of a protein. Changes in bind- 
ing partners of proteins in multi-protein 
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Figure 1 | Protein destruction by protease enzymes. All cells require mechanisms that selectively 
target proteins for destruction by protease enzymes. Some of the possible mechanisms known to act 

in prokaryotes (cells without a nucleus) are shown here. A protein can be targeted for destruction 

if it exposes an amino-acid motif known as a degron, or if the protein is recognized and bound 
specifically by an adaptor protein or modified through the direct attachment ofa polypeptide tag 
molecule. Trentini et al.” have identified a previously unknown type of protein-destruction tag in 
bacteria — phosphorylation (P) of some of the protein’s arginine (Arg) amino acids. Once marked for 
destruction, a protein transits to the protease-enzyme complex. The protein is recognized and unfolded 
by an ATPase enzyme and then enters the inner protease chamber, where it is destroyed. 


38 | NATURE | VOL 539 | 3 NOVEMBER 2016 
© 2016 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


complexes can hide or reveal degrons. Proteins 
can also be targeted for degradation if they are 
bound and specifically delivered to the pro- 
tease by an adaptor protein, which provides 
a supporting role in the degradation process, 
rather than a direct enzymatic one. Modifica- 
tions or changes in the availability of adaptors 
and protein-binding partners can determine 
whether or not a specific protein is degraded 
at a particular time. 

Although the mechanisms that target 
proteins for destruction are well understood, 
how damaged proteins are recognized is less 
clear. When proteins are misfolded as a result 
of heat or oxidation, they might expose an 
internal degron that is normally shielded’”, 
resulting in the protein’s destruction. However, 
little is known about other possible mecha- 
nisms, and Trentini and colleagues’ study 
provides additional mechanistic insights. 

McsB, a B. subtilis protein that is involved 
in the heat-shock response’, was identified as 
a protein-kinase enzyme that phosphorylates 
arginine”'’. McsB-kinase phosphorylation 
promotes degradation of the protein CtsR 
(refs 11, 12), which represses expression of 
heat-shock genes. A previous study'® had 
found that more than 100 proteins under- 
went McsB-dependent phosphorylation, and 
Trentini and colleagues decided to test whether 
arginine phosphorylation might cause protein 
degradation. 

The authors investigated degradation by 
the ClpCP-protease complex in B. subtilis. 
This complex consists of the ClpP protease 
and ClpC, which is an ATPase and unfol- 
dase enzyme. Under heat-shock conditions, 
Trentini et al. used an inactive version of ClpP 
to trap proteins undergoing degradation, and, 
by means of proteomic techniques, identified 
20 different trapped proteins that carried a 
phosphorylated arginine tag; they estimate 
that 25% of ClpP substrates are marked by 
arginine phosphorylation. 

To demonstrate the role of arginine 
phosphorylation in degradation, the authors 
turned to in vitro biochemical studies involv- 
ing purified proteins. Previous investigations 
of protein degradation by ClpCP found that 
the process involved adaptor proteins, and 
McsB was thought to be an adaptor protein”. 
Trentini et al. found that casein (an unfolded 
protein commonly used to study protein deg- 
radation) could be degraded by ClpCP, but that 
degradation required either the ClpC adaptor 
protein MecA or McsB. However, phosphoryl- 
ated casein was degraded by ClpCP without 
the action of an adaptor protein, suggesting 
that the protease recognizes the phosphoryla- 
tion tag directly. Consistent with a model in 
which the ATPase contains a binding site for 
phosphorylated arginine, the authors found 
that a free molecule of phosphorylated argi- 
nine bound to ClpC. 

The amino-terminal domain of most mem- 
bers of the bacterial AAA family of unfoldases 


is involved in binding adaptor proteins or 
proteins to be degraded’. Trentini et al. obtained 
an X-ray crystal structure of the N-terminal 
domain of ClpC in the presence of a phospho- 
rylated arginine. They identified a binding site 
for phosphorylated arginine in ClpC that could 
simultaneously accommodate the positively 
charged arginine and the negatively charged 
phosphoryl group of the phosphate. 

The binding site in ClpC for phosphorylated 
arginine partially overlaps with the binding site 
for the MecA adaptor. Trentini and colleagues 
mutated two amino acids in the N terminus 
of ClpC, thereby abolishing its binding of 
phosphorylated arginine, but still enabling 
MecA-mediated protein binding. This allowed 
the authors to test directly whether protease 
degradation of proteins that have phosphoryl- 
ated arginines is a necessary part of the heat- 
shock response. They found that cells lacking 
ClpC or containing a mutant form of ClpC 
that does not bind to phosphorylated arginine 
hada similarly reduced ability to recover from 
heat-shock stress. This suggests that, during 
heat shock, ClpC is required to recognize 
and degrade proteins marked by arginine 
phosphorylation. 

Integrating these findings into our 
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understanding of heat-shock regulation in 
bacteria will require more work. It is not clear 
what changes occur in proteins or McsB at 
high temperatures that lead to increased argi- 
nine phosphorylation. However, McsB and 
the phosphorylated-arginine binding site in 
ClpC are evolutionarily conserved across many 
members of a group known as Gram-positive 
bacteria, so perhaps arginine phosphoryla- 
tion might be a widely used mechanism to tag 
proteins for destruction. Intriguingly, McsB is 
necessary for Staphylococcus aureus to act as a 
pathogen”. 

Future experiments should try to unravel 
the relative importance of the degradation of 
misfolded proteins compared with the deg- 
radation of regulatory proteins during stress 
survival. Perhaps the machinery for adding 
and recognizing phosphorylated arginine on 
proteins will provide new drug-development 
targets, for example in S. aureus. 

More broadly, the many biological roles of 
phosphorylation have now been expanded 
even further. The ability of this modification 
to mark proteins for degradation, coupled with 
the probable perturbation of protein function 
by the addition of a negatively charged phos- 
phate group to a positively charged arginine, 


Sound and meaning in 
the world’s languages 


The sounds of words that represent particular meanings are usually thought to 
vary arbitrarily across languages. However, a large-scale study of languages 
finds that some associations between sound and meaning are widespread. 


W. TECUMSEH FITCH 


debate. Socrates is asked whether the 
sounds of words are simply arbitrary 
conventions, as Hermogenes suggests, or 
if sounds are reflective in some way of their 
meaning, as Cratylus proposes. Socrates argues 
for the latter option, holding that although 
many words have arbitrary relations to their 
meanings, ‘good’ words are distinguished bya 
correspondence between sense and sound — 
their sound somehow suits their meaning. 
However, most researchers today accept the 
linguist Ferdinand de Saussure’s updated 
version’ of Hermogenes’ viewpoint that the 
connection between sound and meaning in 
language is essentially arbitrary, with a few 
minor exceptions. 
Writing in Proceedings of the National 
Academy of Sciences, Blasi et al.’ address this old 
debate by exploring an unprecedented number 


Pe: dialogue Cratylus begins with a 


of languages for indications of associations 
between sound and meaning, uncovering sub- 
stantial data that support Socrates’ viewpoint. 
The authors find that, although most words 
vary arbitrarily, certain associations between 
speech sounds and the meanings of the words 
that contain them surface time and again 
among populations spread across the globe. 
There are many categories of sound—meaning 
correspondence. The most easily recogniz- 
able form is termed onomatopoeia, in which 
the sound ofa word corresponds to the sound 
made by an animal or object. For example, bird 
names such as cuckoo or chickadee represent 
phonetic attempts to imitate the character- 
istic sounds made by these birds. However, 
even here a certain amount of arbitrariness is 
present. The crowing of a cock is ‘cock-a-doo- 
dle-doo in English but ‘kikerik? in German. 
Overlapping somewhat with onomatopoeia 
are what are known as ideophones. The sound 
of the entire word of an ideophone has a role 
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suggests that Trentini and colleagues’ work 
might be relevant far beyond the B. subtilis 
bacteria they have studied. = 
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in conveying an iconic meaning that can go 
beyond simply imitating sounds. Such ideo- 
phones are uncommon in English, but one 
example is the comic-book word ‘kapow. Yet 
in many other languages, including Korean, 
Japanese and the African language Ewe, there 
are large numbers of ideophones, which are 
frequently used in expressive speech — for 
example, the Japanese term ‘doki doki’ expresses 
the thumping heartbeat of excitement. 

Finally, the most pervasive form of sound- 
meaning correspondence is termed sound 
symbolism, referring to the situation in 
which some part of a word or words has a 
non-arbitrary association between sound 
and meaning, but other components are 
arbitrary. A well-known example is the use of 
different vowels to represent size: the /i/ vowel 
sound (pronounced ‘ee’) in the word ‘teeny’ 
or ‘weeny’ symbolizes small size, whereas the 
/u/ vowel sound (pronounced ‘oo’) in ‘huge’ or 
‘gargantuar’ indicates large size. Similarly, in 
the famous bouba-kiki shape-sound effect’, 
words for sharp, angular objects tend to have 
consonant sounds such as ‘k, whereas smooth, 
rounded objects are associated with conso- 
nants such as ‘r’ and ‘b’ (Fig. 1). Both of these 
phenomena have been extensively studied 
experimentally and seem to be robust across 
cultures and age groups’ *. 

Other potential examples in English include 
systematic sound similarities in words possess- 
ing related meanings. For example, the sound 
‘gl-’ often indicates shiny visual phenomena 
(such as glitter or gleam), whereas ‘sl-’ occurs 
frequently in words with negative connotations 
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Figure 1 | The bouba-kiki phenomenon. When asked which of the shapes shown are probably named 
‘bouba or ‘kik? in an unfamiliar foreign language, people consistently choose the rounded shape (blue) 
for bouba and associate the sharp-edged shape (green) with kiki’ (bouba and kiki are invented words). 
This link between the consonants ‘b’ and ‘k and round and sharp shapes, respectively, is consistently 
found in speakers of different languages, and occurs when other invented words are tested’: for example, 
‘baluma is also associated with the rounded shape. Blasi et al.’ find that such relationships between the 
sound and meaning of words are more common and widespread than previously suspected. 


(like slime, sludge, slum or slander).The 
non-arbitrary sound—meaning associations 
studied by Blasi et al. belong in the category of 
sound symbolism. 

Most previous studies of sound symbolism 
have explored less than 100 languages. Blasi 
and colleagues started by assembling phoneti- 
cally transcribed word lists from an enormous 
variety of languages. They prepared more than 
6,000 lists of about 30 basic words and their 
meanings, as well as conducting a detailed 
examination of 328 word lists containing 100 
items. There is no clear dividing line between 
a ‘dialect’ and a ‘language’ — linguists often 
quip that a language is a dialect with an army 
and a navy. Nonetheless, these lists cover 
roughly 60% of the world’s languages and 85% 
of family groups of related languages (such 
as the Romance language-family grouping 
that includes Italian, French and Spanish). 
Although the lists are short, they encom- 
pass much of the basic vocabulary in each 
language and represent the most common 
nouns and verbs. 

Blasi et al. then searched these lists for 
systematic biases in the probability of particu- 
lar sound—meaning associations (controlling 
for the biases expected as a result of the overall 
occurrence of the sounds). They aimed to find 
robust, widespread sound-symbolic phenom- 
ena, and screened out associations found in 
only one or a few languages. They also aimed 
to take into account such factors as language 
relatedness, word length or language-specific 
constraints. 

The result was a surprisingly long list of 
robust, widespread sound—meaning associa- 
tions. Reassuringly, Blasi and colleagues found 
that the vowel /i/ is widely associated with 
small size, and ‘Y with roundness, as noted by 
previous researchers’ °. Other positive asso- 
ciations found by the authors also seem intui- 
tive, such as ‘n’ for nose, T for tongue (as in 
‘lingual’) and ‘mY’ in words for mother or 


breasts. Negative associations uncovered 
included the observation that the sounds ‘b’ 
and ‘m are unlikely to be present in the word 
for teeth. However, some of the associations 
identified are particularly hard to fathom, such 
as ‘s for dog (such as canis in Latin) or ‘a for 
fish (like pescado in Spanish). 

Where does this leave the debate between 
the views of Socrates and de Saussure? As 
de Saussure argued, most of the sounds in 
words remain arbitrary, but these data from 
Blasi and colleagues indicate that matches of 
sound to sense might be more widespread 
and pervasive than has been discovered by 
previous smaller-scale approaches. 

The underlying processes driving these 
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biases between sound and meaning remain 
a topic for future research. It seems possible 
that some of these connections reflect the 
shared sensory and cognitive biology of our 
species, which over time filters out ‘bad’ words 
that don't sound right, and favours words that 
have a good fit between sense and sound. Over 
many generations, this filtering will result in 
selection of variant words and pronuncia- 
tions, leading to a process of cultural evolu- 
tion in which words obeying Socrates’ dictum 
— that good words have sounds that suit their 
meaning — will have a higher probability of 
persisting. This hypothesis can be easily tested 
in laboratory experiments exploring cultural 
evolution’ ”’, adding new fuel to the ancient 
arguments in Cratylus. = 
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Axions exposed 


Physicists are hunting for a particle called the axion that could solve two major 
puzzles in fundamental physics. An ambitious study calculates the expected mass 
of this particle, which might reshape the experimental searches. SEE LETTER P.69 


MARIA PAOLA LOMBARDO 


he strong nuclear force binds the 

constituents of protons and neutrons 

together and is well described by a 
theory called quantum chromodynamics 
(QCD). Within experimental error, the strong 
force is time-symmetric — the behaviour of 
strong interactions does not change if the flow 
of time is reversed. However, the equations 
of QCD might contain a symmetry-violating 
term that can theoretically take any value. And 
nature has chosen to set this term to zero. Why 
is this? The leading explanation is that the term 
is not zero, but is cancelled out by the presence 
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of a ‘neutralizing’ particle called the axion’ =, 
which has yet to be discovered. Such a parti- 
cle could also solve a long-standing problem 
in cosmology by making up dark matter, the 
‘missing’ matter in the Universe. On page 69, 
Borsanyi et al.* calculate the expected mass 
of the axion with unprecedented accuracy, a 
result that could be useful for understanding 
the properties of this mysterious particle and 
directing strategies for its detection’. 

If axions exist, they would have been 
produced abundantly during the earliest 
moments of the Big Bang. Determining 
the expected mass of this particle therefore 
requires knowledge of the properties of QCD 


at extremely high temperatures. At such 
temperatures, the axion mass would be time- 
dependent and proportional to the square root 
of a quantity called the topological suscepti- 
bility. As the Universe expanded and cooled, 
the mass would have become constant. Con- 
sequently, a detailed understanding of both 
the thermodynamics of QCD (in particular, 
the temperature dependence of the topologi- 
cal susceptibility) and the expansion of the 
Universe is necessary to predict the present- 
day axion mass. 

The topological susceptibility can be 
calculated by exploiting the lattice formula- 
tion of QCD. In this approach, space-time 
is considered to be a four-dimensional grid 
of discrete points, rather than a continuum. 
Elementary particles called quarks reside at 
these points, whereas gluons — the particles 
that bind quarks together — exist on the links 
between them. 

The next step is to determine how the system 
of quarks and gluons fluctuates between 
different ‘topological sectors. A topologi- 
cal sector is analogous to a ribbon that has 
a specific number of twists. In a space-time 
continuum, the ribbon is a closed loop and 
the number of twists is fixed — fluctuations 
between topological sectors cannot occur. 
However, in lattice QCD, the system is discre- 
tized. The ribbon is ‘broker’ into many pieces 
and the number of twists can change — fluc- 
tuations between sectors can easily occur. The 
topological susceptibility can be computed 
by considering all of the sectors, but this 
will not be the correct result for continuous 
space-time. 

By contrast, pushing the system towards 
the continuum limit implies that the fluctua- 
tions will become increasingly less likely. In the 
continuum, the system will be ‘froze’ in one 
particular topological sector, and the suscep- 
tibility can no longer be calculated. There are 
further issues that need to be addressed in the 
calculation, but even at the simplified level of 
the above discussion, it is clear that studying 
the topological properties of QCD is a formi- 
dable challenge® ”. 

Nevertheless, Borsanyi and collaborators 
compute the topological properties of high- 
temperature QCD with a remarkable degree 
of accuracy and sophistication. They achieve 
this, thanks to an admirable combination of 
new ideas and an ingenious use of established 
techniques. The authors’ main technical 
improvement with respect to other studies” ”° 
is a beautiful strategy to overcome the prob- 
lems associated with the reduced number of 
fluctuations between different topological 
sectors when approaching the continuum 
limit. Rather than waiting for fluctuations to 
occur, the authors perform simulations in a 
fixed sector, estimate the probability of fluc- 
tuations between sectors and then reconstruct 
the topological susceptibility for a wide range 
of temperatures. 
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Figure 1 | The hunt for axions. Borsanyi et al.’ have calculated the expected mass of a hypothetical 
particle called the axion. Assuming that axions account for dark matter — the ‘missing’ matter in the 
Universe — the authors determine that the mass of these particles (expressed in energy units based 
on the electronvolt; eV) must be between 50 and 1,500 peV. Also shown are estimates of the projected 


sensitivities of four experiments that have been either proposed or commissioned to look for axions 


Calculating the present-day axion mass 
requires knowledge of the relationship 
between the temperature of the Universe 
and the time elapsed since the Big Bang. 
Borsanyi and colleagues’ work studies the 
physics of QCD up to temperatures of more 
than 10”° kelvin, whereas most lattice simu- 
lations®’’ are limited to temperatures of less 
than about 7 x 10K. The authors discuss the 
temperatures at which ‘charm and ‘botton’ 
quarks contribute to the thermodynamics of 
the system, in addition to the contributions of 
the other elementary particles. The final results 
are obtained thanks to a skilful combination of 
state-of-the-art lattice simulations and other 
theoretical analyses. 

Finally, Borsanyi et al. combine their 
results to obtain a prediction that the axion 
mass (expressed in energy units based on the 
electronvolt; eV) lies between 50 and 1,500 eV 
(Fig. 1). Ifthe authors’ findings are robust, they 
will help to guide experimental searches. Cur- 
rent experiments that look for axions"' either 
try to convert these particles into electro- 
magnetic pulses or observe the effect of axions 
on the spin of neutrons — in both cases, 
assumptions need to be made regarding the 
expected mass. 

The accuracy of Borsanyi and colleagues’ 
results demonstrates the power of their 
numerical approach and highlights the 
pivotal role that strong interactions might have 
in shaping our Universe, by providing a candi- 
date for dark matter. Credit must also be given 
to other researchers in the field who paved the 
way for this work by studying lattice topol- 
ogy and its connection to high-temperature 
QCD. The authors’ work emphasizes that 
numerical studies need to be triggered 
and guided by a deep understanding of the 
underlying physics, but that this can lead to 
impressive results. 


4,11 


Like experimental data, numerical work 
should be validated by independent studies, 
because subtle systematic errors might have 
a substantial impact on the results. As we 
have seen, the first step to take when study- 
ing strong interactions by numerical means 
is to discretize their dynamics and then take 
the system to the continuum limit in a well- 
controlled way. This is highly complex, and is 
a subject of research in itself. Different discre- 
tization methods need to be explored, and the 
strategy designed by Borsanyi et al. must be 
further analysed and applied in other, similar 
work. Let us hope that this remarkable study 
will motivate researchers to further investigate 
the properties of the axion, with the aim of 
giving experiments the robust theoretical input 
that they need. m 
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De novo phasing with X-ray laser reveals 
mosquito larvicide BinAB structure 


Jacques-Philippe Colletier!*, Michael R. Sawaya***, Mari Gingery*, Jose A. Rodriguez”, Duilio Cascio*, Aaron S. Brewster‘, 
Tara Michels-Clark*, Robert H. Hice®, Nicolas Coquelle!, Sébastien Boutet®, Garth J. Williams®, Marc Messerschmidt®, 
Daniel P. DePonte®, Raymond G. Sierra®, Hartawan Laksmono’®, Jason E. Koglin®, Mark S. Hunter®, Hyun-Woo Park®’, 
Monarin Uervirojnangkoorn®, Dennis K. Bideshi>’, Axel T. Brunger®, Brian A. Federici®, Nicholas K. Sauter* & 


David S. Eisenberg? 


BinAB is a naturally occurring paracrystalline larvicide distributed worldwide to combat the devastating diseases borne 
by mosquitoes. These crystals are composed of homologous molecules, BinA and BinB, which play distinct roles in 
the multi-step intoxication process, transforming from harmless, robust crystals, to soluble protoxin heterodimers, to 
internalized mature toxin, and finally to toxic oligomeric pores. The small size of the crystals—50 unit cells per edge, on 
average—has impeded structural characterization by conventional means. Here we report the structure of Lysinibacillus 
sphaericus BinAB solved de novo by serial-femtosecond crystallography at an X-ray free-electron laser. The structure 
reveals tyrosine- and carboxylate-mediated contacts acting as pH switches to release soluble protoxin in the alkaline 
larval midgut. An enormous heterodimeric interface appears to be responsible for anchoring BinA to receptor-bound 
BinB for co-internalization. Remarkably, this interface is largely composed of propeptides, suggesting that proteolytic 
maturation would trigger dissociation of the heterodimer and progression to pore formation. 


Of all insects, mosquitoes continue to be the most injurious to human 
health, spreading devastating diseases such as malaria, filariasis, 
dengue fever, West Nile encephalitis and, more recently, Zika virus. 
Synthetic chemical insecticides are a cost-effective means of reducing 
mosquito vector populations, but their intensive application results 
in resistance. Most recently, resistance to pyrethroid insecticides 
has threatened the control of malaria. To deal with the problems of 
resistance and develop more environmentally sound and sustainable 
approaches to vector control, various biological and genetic-based 
control strategies are under development. These include the use of 
microorganisms such as wolbachial endosymbionts that interfere with 
pathogen development and transmission’, genetically engineered 
female mosquitoes incapable of flight or pathogen transmission?, and 
pathogens such as bacteria or fungi engineered for higher efficacy 
against larvae or adults**. 

Currently, one of the most environmentally safe and efficient means 
of controlling mosquitos is the distribution of naturally occurring 
protein crystals from Bacillus thuringiensis subsp. israelensis (Bti) 
and L. sphaericus (formerly Bacillus sphaericus), as practiced in the 
United States, Germany, China, Thailand, Ivory Coast and Cameroon’. 
These proteins are toxic to their targets, but harmless to humans and 
other animals. The specificity and efficiency of the L. sphaericus 
binary toxin, BinAB, presumably arise from its structural complex- 
ity, which allows it to navigate through a series of barriers en route 
and recognize only its intended target. Originating as protoxins inside 
L. sphaericus, BinA and BinB, in a 1:1 ratio, pack into crystals that 
assemble on the inside of the exosporium, the outside layer of the spore. 
Formulations of spores containing these crystals are distributed in 
aquatic environments where mosquitos breed. Larvae ingest the spores, 


whereupon the crystals dissolve in the alkaline midgut juices (pH 8-11) 
and release heterodimers. These are activated by the cleavage of four 
terminal propeptides. BinB recognizes a glycolipid-anchored maltase 
receptor located at the microvillar surface of midgut cells® and assists 
the internalization of BinA, which carries the toxic function’. Finally, 
the BinAB complex enters and kills midgut cells, which results in larval 
death. Recently, the structure of BinB from L. sphaericus was deter- 
mined®. Here, we elucidate the structure of BinAB crystals, revealing 
features that endow BinA and BinB with their respective functions, and 
we suggest a mechanism by which alkalinity and proteolytic activation 
trigger a series of structural rearrangements that navigate BinAB past 
barriers to reach its target while maintaining the resilient 1:1 association 
throughout the life cycle”. 


Determination of the BinAB toxin structure 

We collected data from the native BinAB complex and three heavy-atom 
derivatives using serial femtosecond crystallography (SFX) methods 
at the Linac Coherent Light Source (LCLS) Coherent X-ray Imaging 
(CXI) instrument”, with the aim of phasing by multiple isomorphous 
replacement with anomalous scattering (MIRAS). We produced nano- 
crystals of L. sphaericus BinAB (Fig. 1a, b) by recombinant expres- 
sion in B. thuringiensis cells and jetted them across the pulsed X-ray 
free-electron laser (XFEL) beam using either a gas dynamic virtual 
nozzle (pH 7 crystals)!! or an electrospinning injector (pH 5 and pH 10 
crystals) !* (Extended Data Fig. 1a). Initial structure factor amplitudes 
were obtained with cctbx.xfel!?, and then corrected for partiality with 
cctbx.prime'4 (Extended Data Table 1). This procedure was essential 
to the success of phasing (Fig. 1c). Heavy-atom sites were successfully 
located for all three derivatives (Extended Data Fig. 1b-d), and their 
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Figure 1 | De novo phasing of SFX data collected from nanocrystals 
of L. sphaericus BinAB. a, b, Phase-contrast micrograph of B. thuringiensis 
sporulated cells engineered to produce BinAB nanocrystals (a) and 
scanning electron micrograph of isolated BinAB nanocrystals (b). c, The 
contribution of partiality refinement to the accuracy of the experimental 
phases is illustrated by maps calculated without (top row) and with 
(bottom row) the post-refinement procedure. The left three columns 
show SIRAS-phased maps from p-chloromercuribenzene sulfonate 
(PCMBS), Gd, and vaporizing iodine labelling (VIL) derivatives. The 
fourth column shows MIRAS phased maps from all three derivatives. 

The right column shows the refined model and 2F,—F- map. The 
improvement from post-refinement is quantified by the correlation 
coefficient (CC) (at the bottom of each panel) between each experimental 
map and the map phased by the final refined coordinates. Main chain and 
side chain atoms are shown in black and grey sticks, respectively. 


phasing information combined (Extended Data Figs le, 2a) to produce 
a map of sufficient quality that 60% of the BinAB complex could be 
traced automatically (Extended Data Fig. 2b, c). Subsequent manual 
building led to a model with an Ryork/Rfree Of 0.164/0.200 at 2.25 A 
resolution (Extended Data Table 1). 
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Figure 2 | BinA and BinB folds and carbohydrate-binding modules. 

a, BinA and BinB are structurally similar to each other, each composed 
of trefoil and pore-forming domains. The most noticeable differences 
correspond to insertions in surface loops on the trefoil domains (purple). 
b, The trefoil domains are composed of barrel and cap subdomains. 

In the cap subdomain, there are three canonical carbohydrate modules 
(a, 8 and y). Carbohydrate molecules superimposed from structures of 
haemagglutinin (3AH1)*’ and haemolytic lectin (1W3G)*° are shown 

in black sticks. These occupy the a, 3, and y binding modules. A fourth 
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The maps revealed two rod-shaped molecules, BinA and BinB, 
each approximately 100 A long and 25-30A in diameter (Fig. 2a). 
They resemble each other closely (1.7 A root mean squared deviation 
(r.m.s.d.) for 329 pairs of «-carbons; Extended Data Fig. 2d), reflecting 
the 28% sequence identity (46% similarity) covering nearly their full 
lengths (Extended Data Fig. 2e). Each subunit comprises two domains: 
an N-terminal 3-trefoil domain (BinA residues 1-155, BinB residues 
1-198) and a C-terminal pore-forming domain (PFD) (BinA residues 
156-370, BinB residues 199-448). 

The most notable structural differences between BinA and BinB are 
located on the carbohydrate-binding modules of the trefoil domains, 
suggesting that these may contribute to the distinct roles of BinA and 
BinB in intoxication. All three modules (a, 8 and 4) of BinA appear 
structurally capable of binding carbohydrate; however, BinB contains 
four loop insertions relative to BinA (residues 62-70, 111-117, 
139-143, and 179-185), which distort the trefoil’s pseudo-three-fold 
symmetry (Fig. 2c and Extended Data Figs 2e, 3a). One insertion, 
residues 62-70, completely obstructs the a-module through a disulfide- 
linked tether, Cys67—Cys161 (Fig. 2c and Extended Data Fig. 3b). These 
differences suggest that BinB has a lesser role in carbohydrate binding, or 
perhaps an adaptation to diverse carbohydrates. Carbohydrate binding 
has been linked with toxicity. Numerous mutations in these modules 
reduce toxicity (Supplementary Discussion and Supplementary 
Tables 1, 2) and sugars such as chitobiose are potent antagonists of the 
BinA toxin!». The outward-facing orientation of the modules in the 
BinAB dimer indicates that they are accessible to cell-surface glyco- 
proteins and glycolipids, which may aid in concentrating Bin AB at the 
cell surface before receptor binding (Fig. 3b). 

Relatively few structural features distinguish BinA from BinB in 
the PED, suggesting that pore assembly may be heteromeric’®. The 
PED topology is characteristic of the aerolysin family of pore-forming 
toxins, comprising a 60 A-long antiparallel 3-sheet, folded into a 
sandwich at one end (sandwich subdomain), and adherent to a putative 
membrane-spanning segment (transmembrane subdomain)’7 at 
the other end (sheet subdomain; Fig. 2a and Extended Data Fig. 4). 
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carbohydrate-binding site, marked III,, is a minor site observed in 
haemagglutinin”. c, View of the cap subdomains along the pseudo- 
three-fold symmetry axis; loop insertions (purple) break the symmetry 
in BinB. The starburst indicates a steric overlap between the modelled 
carbohydrate and the a-module of BinB. The conflict arises from the 
9-residue insertion in this loop, tethered by a disulfide bond, C67-C161. 
d, View of the barrel subdomains along the pseudo-three-fold symmetry 
axis. BinB residues implicated in receptor binding are shown in sticks 
(orange). Structurally analogous residues are shown on BinA. 
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Figure 3 | BinAB dimer assembly is weakened by proteolysis. a, The 
BinAB dimer shown here manifests the largest intermolecular interface in 
the crystal. The vertical arrow and lens-shaped symbol (black) indicate the 
position of the pseudo-two-fold rotation axis that relates BinA and BinB. 
The interface extends over all four domains in both molecules. Propeptides 
(dark blue and dark red) play a substantial role in the interface. b, Surface 
representation of the BinAB dimer shows that the canonical carbohydrate 
modules are accessible, but the receptor-binding epitope is not, implying 
that carbohydrate binding occurs first and then a conformational change 
exposes the receptor binding epitope. c-d, Transformation of the BinAB 


This structural homology suggests that BinAB may form pores by a 
mechanism similar to aerolysin’’, in which the transmembrane sub- 
domain refolds into a hairpin, oligomerizes into a 8-barrel, and inserts 
into the membrane. However, no experiment has revealed whether 
these pores would be homo- or hetero-oligomeric. We find that the 
degree of structural similarity between the PFDs of BinA and BinB 
(1.4A r.m.s.d. over 203 a-carbons; Extended Data Fig. 2d) is even closer 
than that observed between LukF and Hlg2 (1.5 A r.m.s.d. over 241 
a-carbons), which adopt alternate positions around the ring-shaped 
heterooctameric pore of y-haemolysin’”. A heteromeric assembly anal- 
ogous to 7\-haemolysin might explain why BinA and BinB form pores 
more efficiently in combination than either component separately'©””. 


Propeptides at the BinAB dimer interface 
The crystal packing reveals a unique and exceptionally large heterod- 
imer interface, specifying a resilient 1:1 association between the two 
components. In this interface, the 100 A-long BinA and BinB molecules 
cross each other at a 30° angle, forming an X shape (Fig. 3a, b). This 
interface is so large (burying 1,855 A’ on BinA and 1,800 A? on BinB) 
that it accounts for nearly half of all the intermolecular interface area in 
the crystal. It exceeds by far the threshold value (856 2 per monomer) 
estimated to discriminate between biological and artificial dimers’. It 
is three times larger than the next largest interface in the crystal, and 
therefore it is likely to be the only one of the total seven intermolecular 
interfaces preserved after dissolution of the crystals (Extended Data 
Figs 5, 6 and Supplementary Tables 3-9). Its shape complementarity 
(0.65) is similar to that observed in antibody-antigen interfaces” and 
electrostatic complementarity is evident (Extended Data Fig. 7a, b). 
The fit and extent of this heterodimer association implies stability, 
which probably contributes to attaining the 1:1 molar ratio shown to be 
optimal for receptor binding and toxicity (Supplementary Tables 1, 2) 
Further examination of the heterodimer interface reveals that its 
size would be severely reduced by proteolytic activation and that this 
loss would appear to threaten access of BinA into the cell. Remarkably, 
42% of the heterodimer interface involves propeptide segments, that 
is, 1,539 A2 combined interface surface area (Fig. 3c, d, Extended Data 
Figs 5c, 6a, c, d, f, and Supplementary Tables 3, 10). Therefore, removal 
of these cleavable segments before internalization would dismantle 
what may be the essential interface tethering BinA to its chaperone, 
the cell-surface-receptor-bound BinB. Indeed, deletion of the largest of 


interface accompanying proteolysis. The BinAB dimer is split apart to 
reveal the interface. All four subdomains and three propeptides contribute 
to the BinAB interface (c). Dashed lines connect specific propeptide 
residues in contact across the dimer interface. Dissociation of the 
propeptides following proteolysis eliminates 42% of the interface (d). 
Dotted lines encompassing white patches mark the interface lost after 
dissociation of the propeptides. Dashed lines connect select residues 
remaining in contact across the dimer interface following propeptide 
dissociation. The transmembrane subdomain (yellow) is the only 
subdomain that does not lose contacts after proteolysis. 


the four propeptides (BinB residues 396-448) reduces the ability of this 
truncated mutant to direct the regional binding of BinA to target cells”. 
However, we note that owing to the large surface area it buries in BinAB 
(3,790 A2), this 53-residue propeptide would be slow to release. Slow 
release may delay heterodimer dissociation until after internalization, 
when, at the targeted location, it signals transformation into a pore. 


Alkaline-triggered release of BinAB dimers 

Of all the transformations undergone by BinAB over its life cycle, 
our data bear most directly on the pH-signalled transformation from 
crystal to soluble dimers. Using crystallography as a means of struc- 
ture elucidation, we can see in detail the molecular contacts that are 
the focus of this transformation and monitor their perturbation with 
elevation of pH from 7 to 10. The most notable attribute of this trans- 
formation is the seemingly contradictory combination of both stability 
and sensitivity that evolved to adapt the crystal to different stages of the 
life cycle. Crystal stability preserves and stores potency in harsh envi- 
ronments before ingestion, yet alkalinity readily dissolves the crystal 
in the larval midgut (pH 8-11) after ingestion, releasing BinAB to 
access cell-surface receptors and activating proteases. The mechanism 
by which alkalinity lowers the high barrier to crystal dissolution is 
evident at three levels: amino acid composition, and local, and global 
structural changes. 

Alkalinity may facilitate crystal dissolution through the concerted 
deprotonation of BinAB’s 49 tyrosine hydroxyl groups, occurring 
around pH 10 (pK, = 10.1) (Extended Data Fig. 7c). The frequency 
of tyrosine residues (6%) is almost twice the average for proteins 
(3.5%)74. We calculate that pH elevation from 7 to 10 would change 
the net charge on BinAB from —13.9 to —73.5e, thereby destabilizing 
the crystals through negative electrostatic repulsion (Extended Data 
Fig. 7d, e). An analogous mechanism was proposed for the dissolution 
of the ultra-stable viral spindle crystals, which also contain an unusually 
high frequency of tyrosines (8.6%), some located strategically at crystal 
contacts”. 

Alkalinity observably perturbs four regions in BinAB crystals, iden- 
tified by peaks in a difference Fourier map (F,—F,) computed between 
data sets collected at pH 7 and pH 10 (Fig. 4, Extended Data Fig. 8 
and Supplementary Tables 11, 12). These regions involve contacts 
between molecules or subdomains. Two of these perturbations increase 
accessibility of propeptide segments to proteases. The N-terminal 
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Figure 4 | pH sensing in BinAB crystals. a-f, BinA and BinB cartoons are 
coloured by subdomain. The pH 5, pH 7 and pH 10 structures are shaded 
light, medium, and dark, respectively. Symmetry-related molecules are 
coloured similarly. The F,!?#1°l_F, P47] map is superimposed at +3.5c, 
with positive and negative peaks shown in green and red, respectively. 

a and b show orthogonal views of the BinAB dimer. The F,—F, map 
reveals four regions (c-f) of the BinAB crystal that are perturbed by 
elevated pH, probably reflecting early events in crystal dissolution. 
Rupture of hydrogen bonds and conformational changes are highlighted 
by starbursts and arrows, respectively. c, Deprotonation of Asp8, Glul4 
and Tyr213 triggers a helix to extended (}-strand conformational change in 
the BinA N-terminal propeptide, increasing its accessibility to proteolytic 
cleavage. d, Deprotonation of the BinB Gln448 terminal carboxylate breaks 
its H-bond with BinA Asp22, weakening the BinA-BinB dimer interface 
and making the C-terminal propeptide of BinB available for proteolysis. 

e, Deprotonation of BinA Tyr134 and His125 results in the rupture of their 
H bonds with BinB Glu59. Loss of these contacts directly weakens the 
lattice. f, Negative electrostatic repulsion pushes Asp342 away from Glu240 
in BinA. This rearrangement could be an early step in the transformation 
into a pore, owing to Asp342’s location at the junction of the three PFD 
subdomains. 


propeptide of BinA (residues 1-10) unravels from a helix to an 
extended conformation, following the loss of a hydrogen bond between 
Gly15(O) and Tyr213(OH) of BinA (Fig. 4c and Extended Data Fig. 9). 
Similarly, a pair of hydrogen bonds is broken in the BinAB dimer 
interface between the C-terminal carboxylate of the BinB propeptide 
(Gln448) and BinA Asp22 side-chain carboxylate (Fig. 4d). The third 
region is a lattice contact outside the dimer interface, between BinA 
Tyr134(OH) and BinB Glu59(OE2) which breaks at elevated pH (Fig. 4e 
and Supplementary Table 7). The fourth region involves loss of a 
hydrogen bond between BinA Asp342(OD1) and Glu240(OE1). Located 
at the junction between the transmembrane and sandwich subdomains, 
this break might be an early step towards pore formation (Fig. 4f 
and Extended Data Fig. 8). In each of these four regions, deprotona- 
tion is presumably the cause of hydrogen-bond disruption, involving 
either a tyrosine hydroxyl or a carboxylate group paired with an obligate 
hydrogen bond acceptor (Fig. 4 and Supplementary Discussion). 
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Alkalinity also induces global hinge motions, potentially straining 
the crystal lattice and thereby contributing to its dissolution. The 
trefoils of BinA and BinB move about 0.5 A closer to their respective 
PFDs (Extended Data Fig. 9 and Supplementary Discussion). 
These motions might also foreshadow a rearrangement of the dimer 
interface, to expose the receptor-binding motif, which is otherwise 
buried in the dimer interface (Fig. 3a, b). 

To validate our model of alkaline-induced crystal dissolution 
and address concerns that the structural changes we attribute to pH 
elevation may have instead originated from unintended radiation 
damage (Supplementary Discussion), we mutated one of the four 
pH-sensitive switches that we identified, BinA Asp22. Recall that 
upon pH elevation, withdrawal of donor hydrogen atoms breaks the 
bifurcated hydrogen bond between carboxylates of Asp22 and the 
C terminus of BinB (Fig. 4d). Mutation from Asp to Asn should reduce 
toxicity by decreasing sensitivity of this interaction to alkaline-induced 
rupture and thereby delay release of the C terminus for proteolytic 
activation. This mutation did not change the appearance of the crystals; 
however, it reduced crystal solubility by 30% in the period 30-90 min 
after pH elevation (Extended Data Fig. 7f) and increased LCs) and 
LCs by 11.6-fold and 24-fold, respectively (Supplementary Table 13). 
The effect of this single conservative substitution in a 93-kDa complex 
provides convincing validation of our model. 

Our structure of BinAB illuminates several important molecular 
events in the life cycle of the toxin. These include discovery of (1) four 
pH-sensitive switches that facilitate crystal dissolution in the larval 
midgut; (2) a large heterodimer interface that explains how heterod- 
imers persist after dissolution; (3) three competent carbohydrate- 
binding modules in BinA that may assist in directing heterodimers 
to the cell surface; and (4) the potential to disrupt the heterodimer 
interface by proteolytic activation, thereby signalling remodelling. Our 
success in de novo MIRAS phasing of a crystallographic asymmetric 
unit nearly three times larger than any previously phased de novo by 
SFX?°*8, and from crystals approximately 50 unit cells per edge, sug- 
gests this approach could be applied again in other cases where crystal 
size is limiting. 

Online Content Methods, along with any additional Extended Data display items and 


Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


In vivo production of Lysinibacillus sphaericus BinAB crystals. L. sphaericus 
BinAB crystals were grown in an acrystalliferous Bacillus thuringiensis strain 
engineered to improve crystal expression, 4Q7 (Bacillus Stock Center at Ohio 
State University, Columbus, Ohio), which allowed growth of much larger crystals 
than occur naturally in L. sphaericus. Testing for mycoplasma was not carried 
out. There is no evidence that growth of the L. sphaericus BinAB crystals in 
B. thuringiensis in any way affected the structure of BinAB; the toxicity per unit 
mass of the Bin crystals produced in B. thuringiensis was the same as that of 
crystals produced in L. sphaericus. Assuming that the cytosolic compositions of 
L. sphaericus and B. thuringiensis cells do not strongly differ, crystal packing inter- 
actions are expected to persist and crystals formed by BinAB in the two types 
of cells should be identical in terms of space group and unit cell constants. Five 
hundred millilitres of glucose-yeast-salts (GYS) growth medium (0.1% glucose, 
0.2% yeast extract, 0.05% K,HPOsg, 0.2% (NH4)2SOu, 0.002% MgSOg, 0.005% 
MnSO,, and 0.008% CaCl.) supplemented with 25 1g/ml erythromycin was 
sterilized in a 2-1 baffled flask and inoculated with spores from a lyophilized 5-day 
lysate of B. thuringiensis subsp. israelensis strain 4Q7 containing plasmid pPHSP-1 
(Bti4Q7/pPHSP-1), which encodes BinA and BinB from L. sphaericus strain 2362 
(ref. 31). Of all known L. sphaericus strains, this strain produces the most potent 
toxin. Cultures were harvested after growth for 5 days at 30°C with shaking at 
250 rpm, monitored by phase contrast light microscopy until sporulation and cell 
lysis were observed. 

We attempted to obtain crystals of BinAB toxin incorporating the unnatural 
amino acid 3-iodo-tyrosine (3iTyr) at tyrosine positions in BinA and BinB. By 
exposing early-sporulating cells to excess 3iTyr, phenylalanine, and tryptophan, 
we aimed to block aromatic amino acid biosynthesis and force the utilization of 
the iodine-labelled amino acid. Bti4Q7/pPHSP-1 cells were grown to late log 
phase and exposed to 3iTyr, phenylalanine, and tryptophan at concentrations of 
100 mM each. Two further additions of the three amino acids, each increasing the 
concentration by 65 mM, were made at 24-h intervals. Cultures were harvested 
after 4 days. However, SDS-PAGE analysis showed partial or no incorporation of 
3iTyr in the BinAB crystals. Furthermore, there was no evidence of incorporation 
in the final structure of BinAB. Following culture lysis, spores, crystals, cells, and 
cell debris were pelleted by centrifugation at 6,000g for 30 min, then resuspended 
in 50 ml sterile water. The concentrated lysate was sonicated for 3 min on ice 
(1s on, 1s off (6min elapsed time); 60% intensity) to lyse the remaining cells. 
The sonicated lysate was pelleted at 6,000g for 30 min at 4°C. All subsequent steps 
were performed at 4°C. The pellet was washed in 50 ml cold water to remove 
soluble material and some of the spores, then re-pelleted, and resuspended in 
15 ml water. Crystals were isolated from the suspension by sucrose gradient 
centrifugation as described in ref. 32. Before injecting the sample into the beam, 
large particles were removed by filtering the material with a 10-j1m stainless 
steel frit. 

Construction of pPHSP1 D22N and pBUSP-1 and pBUSP-1 D22N clones. The 
recombinant plasmid pPHSP- 1, containing BinA and BinB open reading frames 
under control of the pSTAB promoter’, was used as a template to amplify two 
DNA fragments each containing the D22N mutation in the BinA ORE, using Q5 
polymerase (New England Biolabs). The first fragment began at the MluI site 
near the 3’ end of the BinB ORF and went through the D22 region of the BinA 
ORE, and contained the BinA D22N mutation; this fragment was amplified using 
primer pairs [1]. The second fragment began at the D22 region and continued 
to the end of the Bin operon terminator, and was amplified with primer pair [2]. 
Both fragments had approximately 15-bp overlapping homology regions with 
the vector on one end and the second fragment on the other. The two fragments 
were assembled along with pPHSP-1 linearized with Mlul and PstI by recombi- 
neering using the Choo-Choo Cloning Kit (MCLAB). The entire Bin operon of 
the resulting clone was sequenced to confirm the presence of the D22N mutation. 
Full-length fragments of the entire Bin operon of either wild-type pPHSP-1 or 
pPHSP-1-D22N, including approximately 15-bp vector homology regions on 
both ends, were amplified using primers [3], and assembled by recombineering 
in pBU4™ linearized with KpnI and Sall. The resulting plasmids, pBUSP-1 and 
pBUSP-1 D22N, which also contained the 20-kDa helper protein gene as previ- 
ously described*>*° were sequenced and used to transform Bacillus thuringiensis 
strain 4Q7 by electroporation*. 

Primer pairs: 

[1] TTCACACTAAAACGCGTTAATGGTGAAATTG (BinA D22N For 1) and 
CTCGCTATTATAAAAATTCATAACGCGAATGTACTTTCCTTCTG (BinA 
D22N Rev 1) 

[2] CGTTATGAATTTTTATAATAGCGAGTATCCTTTCTGTATACATGCACC 
(Bin D22N Frag 2 F2 New) and CCAAGCTTGCATGCCTGCAGC (BinA D22N 
Rev 2) 


[3] GTGAATTCGAGCTCGGTACCGAATTCTATTTTCGATTTCAAATTTTC 
CAAAC (HSP Promoter For) and GGGTGTTAACGTCGACAAACAACAACA 
GTTTACATTCGA (HSP Bin Term. Rev) 

Bioassay for mosquito larvicide activity. Lyophilized cultures containing spores 
and parasporal bodies of the L. sphaericus BinAB and BinAP”?YB strains were 
resuspended in ddH,0. Suspensions were diluted to 6-7 different concentrations, 
ranging from 0.5 ng/ml to 1\1g/ml, in 6 oz cups in a final volume of 100 ml. 
Bioassays were replicated three times using 30 fourth-instars of S-Laboratory 
(Bin-sensitive) strains of Cx. quinquefasciatus. After 24h of exposure at 28 °C, 
dead larvae were counted and the 50% and 95% lethal concentrations, respec- 
tively (LCs9 and LCgs), were calculated by Probit analysis (POLO-PC; LeOra 
Software). 

Assay for alkaline-induced crystal dissolution for wild-type BinA-BinB 
and BinA(D22N)-BinB. The strains producing BinAB (4Q7/pPHSP-1) and 
BinAB(D22N) (4Q7/pBU-D22N) were grown in nutrient broth + glucose (NBG) 
supplemented with, respectively, erythromycin (25 mg/ml) and tetracycline 
(3 mg/ml), for 5 days until >95% of cells had sporulated and lysed. To isolate 
crystalline inclusions, spore/crystal mixtures collected from 50-ml cultures 
were resuspended in 15 ml ddH,O and sonicated twice at 50% duty cycle for 
15s using the Ultrasonic Homogenizer 4710 (Cole-Parmer Instrument Co.). 
Five-millilitre samples were loaded onto a sucrose gradient cushion (30-65% 
w/v), which was then centrifuged at 20,000g for 45 min at 20°C in a Beckman 
L7-55 ultracentrifuge using the SW28 rotor. Bands containing inclusions were 
collected and washed three times in ddH,0, followed by centrifugation at 6,500g 
for 15 min at 4°C after each wash, then lyophilized for storage at —20°C until 
use. For solubilization assays, 200 j1g or 800 1g of each crystal preparation was 
resuspended in, respectively, 1001 of 10 mM Tris-Cl (pH 7; baseline control) or 
400 l of 10 mM Tris-Cl (pH 10; solubilization buffer). Assays were performed in 
triplicate. For each sample, three 25-j11 aliquots were collected and spun at 16,300 
for 2 min to pellet undissolved crystals, and supernatants were transferred to fresh 
microfuge tubes containing 7511 of 10mM Tris-Cl (pH 10). Absorbance at 2830nm 
(Agg9) was recorded at various time intervals. 

Preparation of the mercury derivative. We produced the mercury derivative by 
soaking the BinAB crystals in 1 mM p-chloromercuribenzene sulfonate (PCMBS) 
for five months. Specifically, we added 2001] of 0.1 M PCMBS to 15 ml crystal 
slurry at room temperature, and then we passed the mixture through a stainless 
steel frit with 10-\1m pores. We estimated the concentration of crystals in the 
slurry by pelleting the crystals in a centrifuge, then noting the crystals occupied 
approximately 25.1 of the 1-ml sample volume. The five-month incubation was not 
intentional. We intended to use the derivative on the day it was prepared, which was 
during a previous data collection trip; however, this sample immediately clogged 
the gas dynamic virtual nozzle (GDVN) injector upon its first use. So, virtually 
none of this large volume of sample passed through the injector. In retrospect, 
we think the clogging was the result of mercuric iodide precipitation induced by 
residual iodide ion in the delivery lines; a derivative containing 0.5 M potassium 
iodide had indeed been run just previously to injecting the PCMBS derivative. To 
reduce the chance of this same PCMBS sample clogging again, we removed the 
non-covalently bound PCMBS by washing the crystals twice in deionized water 
a week before the LCLS experiment. The crystals used for derivatization in this 
experiment have the same origin as those we used to produce our effectively ‘native’ 
data set. They were prepared under conditions intended to substitute 3iTyr in place 
of tyrosine, but the substitution was not efficient and so they were effectively native 
before derivatizing with PCMBS. 

Preparation of the gadolinium derivative. We produced the gadolinium derivative 
by soaking the BinAB crystals in 5mM GdCl3. More specifically, we added 50 1l 
of 0.1M GdCl; to 1 ml crystal slurry at room temperature one day before the 
diffraction experiment. We estimated the concentration of crystals in the slurry 
by pelleting the crystals in a centrifuge, then noting the crystals occupied approxi- 
mately 25 il of the 1-ml sample volume. In order to conserve the amount of crystals 
used in screening for a heavy atom derivative, we recycled the crystal sample that 
had been used in previous diffraction experiments performed in helium vapour 
atmosphere (that is, not vacuum). For example, the crystals used in this experiment 
had been soaking for five months in KI and CsCl since the time they cycled through 
a GDVN nozzle for data collection in a previous XFEL experiment. The sample 
could be recycled without fear that the previous experiment had damaged the 
crystals since over 99.9% of the crystals never intercepted the XFEL beam during 
the previous XFEL experiment. We washed the KI and CsCl from the crystals by 
pelleting the crystals and replacing the supernatant with water three times before 
adding GdCl. 

Preparation of the VIL derivative. We produced an iodinated tyrosine derivative 
by adapting the vaporizing iodine labelling (VIL) methods described in ref. 37. 
The method uses gaseous iodine vapour to derivatize the ortho positions of 
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accessible tyrosine residues. In our case, rather than derivatizing a few crystals in 
a standard 1-5-1 drop, we derivatized 1 ml crystal slurry. To improve accessibility 
of the iodine vapour to this relatively larger volume of crystals, we distributed 
the 1 ml of BinAB crystal slurry evenly over nine wells of a glass depression dish 
(that is, 111 il per well). Four 10-11 aliquots of a KI/I, mixture (0.67 M/0.47 M 
concentration) were placed on the dish, adjacent to but not contacting the crystal 
slurry in the wells. The plate was sealed inside a sandwich box with 5 ml of water 
to act as a reservoir. The three components (drops of crystal slurry, the iodine, 
and the water reservoir) were separated in space but in vapour contact. The sand- 
wich plate was sealed with tape at room temperature. The drops of crystal slurry 
turned yellow after 3.5h. The crystals were harvested after 21h. We estimated the 
concentration of crystals in the slurry by pelleting the crystals in a centrifuge, then 
noting the crystals occupied approximately 25 11 of the 1-ml sample volume. The 
crystals used in this experiment had the same history of origin as those used for 
the gadolinium derivative. They had been soaking for five months in KI and CsCl 
for a previous XFEL experiment, then jetted across the beam in helium vapour 
atmosphere, and washed three times to remove the non-covalently bound heavy 
atoms before derivatizing with iodine. 

Sample injection using GDVN (pH 7 crystals). BinAB nanocrystals of the native 
and derivatives (average volume is 0.75 x 0.5 x 0.25 1m?) were injected using a 
gas-focused dynamic virtual nozzle (GDVN) liquid microinjection system* at the 
Coherent X-ray Imaging (CX1) instrument of LCLS"”. Data were collected on two 
occasions, one for native data and one for heavy atom data, for a combined time 
of 245.9 min, with the detector framing at 120 Hz. 

Preparation of pH 5 BinAB crystals. Crystals of native BinAB at pH 7 were 
pelleted by centrifugation at 10,000g for 5 min and then resuspended in pH 5 
buffer (0.1 M N-cyclohexyl-3-aminopropanesulfonate (CAPS) pH 10.0, 30% 
glycerol (v/v), 10% PEG 2000 MME, and 0.1 M NaCl). This preparation was 
originally intended to be at pH 10; however, it was later discovered that one of the 
components, polyethylene glycol monomethy] ether 2000 (PEG 2000 MME) was 
acidic (probably due to age), overpowering the pH 10 CAPS buffer, and bringing 
the final pH to 5. The pH of the crystal slurry was indicated by colour change 
on pH paper. The crystals were soaked at pH 5 for 10 min before the diffraction 
experiment. 

Preparation of pH 10 BinAB crystals. Crystals of native BinAB at pH 7 were 
pelleted by centrifugation at 10,000g for 5 min and then resuspended in pH 10 
buffer (0.11 M CAPS pH 10.0, 33% glycerol (v/v), 11% methylpentanediol, and 
0.11 M NaCl). The pH of the crystal suspension was verified by colour change 
on pH paper. The crystals were soaked at pH 10.0 for 5h before the diffraction 
experiment. 

Sample injection via MESH-on-a-stick (pH 5 and pH 10 crystals). Crystals 
of native BinAB in pH 5 and pH 10 buffers were delivered in the microfluidic 
electrokinetic sample holder (MESH) method, described more fully in ref. 12. 
The MESH injection system was modified (Extended Data Fig. 1a) to interface 
more readily with the CXI nozzle rods that mount interchangeably into the liquid 
sample delivery system. The crystal slurries were prepared as described above. 
A continuous 1.5-m-long fused silica, polyamide-coated capillary of 100,1m inner 
diameter and 360|1m outer diameter was used to deliver the sample into the CXI 
vacuum chamber. Approximately 80011 of sample slurry with glycerol additive was 
placed in a microcentrifuge tube and placed in a small pressurized sample-holder. 
The capillary and platinum wire were fed through the pressure cell and immersed 
in the slurry. A small backing pressure of 5 psig nitrogen gas was applied to aid the 
injection. The voltage was applied by a Stanford Research Systems PS350 high- 
voltage source and was held between 4,300 and 4,500 V (currents <1,1A) while the 
counter electrode was grounded. The flow rate was not directly measured, but we 
estimate that the sample consumption was approximately 211/min, as judged from 
crude measurements of leftover sample volume. The pH 10 structure presented 
here was collected in less than 1h of continuous beamtime and consumed less than 
1ml sample. The sample injection compared favourably to prior attempts at data 
collection by giving comparable resolution with only hundreds of microlitres of 
sample consumed. 

Data collection, indexing, merging and post-refinement. The sample chamber 
was at room temperature, under vacuum. Native and derivative pH 7 data sets were 
collected with an XFEL beam focused to a 1-1sm FWHM spot, and characterized 
by a wavelength of 1.21 A (native crystals; 1.9 x 10"! photons per pulse) or 1.41 A 
(heavy atom soaked crystals; 7.2 x 10'! photons per pulse). We chose a single 
wavelength for all three derivatives as a compromise of beam stability, time 
efficiency, and measurable f” for the elements used (Extended Data Fig. 1b). 
The pH 5 and pH 10 data sets were collected with a XFEL beam focused to a 100nm 
FWHM spot (6.4 x 10"! photons per pulse), and characterized by a wavelength of 
1.46 A. Thus, the pH 5 and pH 10 data sets were collected with a ~500-fold higher 
dose than the pH 7 data set (Extended Data Table 1). 
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Images were reduced using cctbx.xfel!“°. No thresholding was applied to filter 
out blank shots; we attempted to index every frame. Furthermore, for the heavy 
atom-soaked data, if a frame indexed successfully, we removed the primary lattice 
from the list of candidate reflections and attempted to index again, searching for 
a secondary lattice. Eight per cent of the final number of indexed images from 
the heavy atom data was from secondary lattices. In total, 24.5% of all recorded 
frames from the heavy atom data contained indexable patterns (13.2% for the 
native data), for a final total of 398,971 indexed patterns across the three derivatives 
and the native data set collected at pH 7. The data sets collected at pH 5 and pH 10, 
respectively, consisted of 17,099 and 27,792 indexed patterns. First we merged and 
post-refined our data without negative intensities included using cctbx.prime!, 
which determines relative scale factors for each image, along with partiality 
estimates for each structure factor measurement. Post-refinement in this manner 
was critical for multiple isomorphous replacement with anomalous scattering 
(MIRAS) phasing as described below. Figure Ic illustrates, for the three derivatives 
and their combination thereof, the extent to which cctbx.prime improved the 
quality of the intensities, leading to improved phases, and thereby rendering the 
path of the peptidic chain visible in electron density map, where only broken 
density was visible before. Resolution cut-offs were determined based on com- 
pleteness (>90%), redundancy (>4) and CCj)2 (>0.14) ; in all data sets, <I/sigI> 
in the highest resolution shells was greater than 6. For all data sets, postrefine- 
ment corrected for the inherent partiality of SFX data and thus improved the 
o-weighted (variance-weighted) average estimate of structure factor amplitudes. 
Handling of negative intensities was then implemented in cctbx.xfel, allowing 
the production of meliorated post-refined data sets with intensity distributions 
that no longer have abnormal reflection intensity metrics (L-test and NZ-test). 
We used the same criteria for the resolution cut-off, based on completeness on 
(>90%), redundancy (>4) and CC1/2 (>0.14); in all data sets, the resulting 
<I/sigI > in the highest resolution shells was greater than 0.5. While inclusion of 
negative intensities dramatically affects <J/sigI>, it results in stabilization of the 
refinement for the three native structures (pH 5, pH 7 and pH 10), requiring us 
to impose a lower weight on geometry in phenix.refine to obtain a structure with 
a plateauing Rfree. Inclusion of negative intensities in the data sets also resulted 
in a decrease of noise levels in q-weighted structure factor amplitude Fourier 
difference maps (see below). At the atomic level, the only noticeable change 
between structures refined with and without negative intensities included is an 
increase in the B-factors, reflecting the increase of the Wilson-B—and possibly 
giving a more realistic description of the BinAB structures at room temperature. 
Structure determination by MIRAS. Heavy-atom sites were successfully 
located for all three derivatives using SHELXD“ (Extended Data Fig. Ic, d). 
The individual derivatives demonstrated limited phasing power, and the maps 
they produced were not interpretable. Fortunately, the three derivatives were 
isomorphous (Extended Data Fig. le), and so could be combined to obtain 
a map of sufficient quality to automatically trace a partial BinAB model using 
the Phenix suite’! (Fig. 1c and Extended Data Fig. 2a, b), and extend using 
Arp/Warp™. The remaining residues were built manually, leading to a model 
with Rwork/Réree of 0.164/0.200 at a resolution of 2.25 A (Extended Data Table 1). 
After each cycle of model rebuilding, reciprocal space refinement (including 
refinement of coordinates, atomic displacement parameters, TLS parameters and 
occupancy) was carried out using phenix.refine. The final MIRAS model features 
residues 1-357 of BinA and 28-448 of BinB. Figure 1c illustrates the benefit of post- 
refinement to the accuracy of experimental phasing. For this figure, all SIRAS and 
MIRAS maps were calculated using MLPHARE, followed by ten cycles of density 
modification with DM* enforcing 58% solvent content, and displayed at 2.8 A 
resolution. The CC quantifies the improvement gained from post-refinement. It 
reports on the whole asymmetric unit, not just the region illustrated in the figure. 
Details of MIRAS phasing. We attempted to determine heavy-atom substruc- 
tures for each of the three derivatives by supplying both isomorphous and anom- 
alous difference signals (SIRAS), or only anomalous difference signal (SAD) to 
the program SHELXD” in super-sharpening mode (PSMF = —4). Success was 
evidenced by the clear separation between populations of false and true solutions 
ranked by goodness-of-fit. Super-sharpening enhanced this separation. We found 
that SAD substructure determination was successful for the PCBMS (mercury) and 
Gd (gadolinium) derivatives, but not for the VIL (iodide) derivative (Extended 
Data Fig. 1d). SIRAS substructure determinations were, however, successful for 
all three derivatives (Extended Data Fig. 1d). SIRAS phases of the mercury and 
iodide derivatives are of better quality than their SAD counterparts, as evidenced 
by a sharper separation between populations of false and true solutions ranked 
by goodness-of-fit (Extended Data Fig. 1d). In contrast, the SAD phases of the 
gadolinium derivative are better quality than its SIRAS phases. 

However, for each individual derivative, the correct choice of hand remained 
ambiguous after initial density modification of SIRAS phased maps and chain 
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tracing with SHELXD, as judged by a lack of separation in map quality statis- 
tics between the two hands, such as contrast, connectivity, and correlation 
coefficient for auto-traced chains (Extended Data Fig. 2a). We sought to maxi- 
mize the distinction between hands by systematically computing a series of maps 
at different estimated solvent contents, from 55 to 73%, and comparing the map 
quality statistics from opposing hands (Extended Data Fig. 2a). The true solvent 
content of BinAB crystals is 59.7%. We found that for each value of solvent content 
tested between 55% and 65%, a consistent choice of hand was indicated by the 
correlation coefficient of the automatically traced chain. However, at higher solvent 
content values tested, the opposite hand was indicated. The correlation coefficients 
of traces (CCs) were below 10% in all cases, and the differences between the CCs 
of the two hands were in the 0.5-2.5% range. So, the distinction between hands 
appeared to remain ambiguous at this point. For each derivative, we chose the 
hand having the most consistently (repeatedly) better correlation coefficient over 
the range of solvent content values tested. The refined heavy atom sites were input 
to phenix.autosol from the Phenix*! software suite for phase refinement, density 
modification and initial model building. 

Attempts to solve the structure based on any of the single derivative (including 
single isomorphous replacement, single anomalous dispersion and single 
isomorphous replacement with anomalous scattering) were all unsuccessful. 
The combination of phase information from the three derivatives nonetheless 
produced usable maps, allowing us to obtain an experimental BinAB model. Our 
previous inability to distinguish the correct hand for the heavy atom substruc- 
ture for the individual derivatives was overcome in Phenix through use of cross 
difference Fourier techniques, an option that becomes available with multiple 
derivatives (Extended Data Fig. 2a). Briefly, phenix.autosol (using Solve for phase 
combination)“ was used to test various phase combinations schemes and to refine 
phases. Although quality maps were obtained by combining anomalous and 
isomorphous signals from all three derivatives (MIRAS), the best map was obtained 
combining the anomalous signal of the gadolinium derivative with the combined 
anomalous and isomorphous differences of the mercury and iodide derivatives. 
Initial figures of merits (FOMs) were 0.12, 0.14 and 0.19 for the PCMBS, Gd and 
VIL substructures, respectively. The FOM of the best phase-combined map was 
0.19 before density modification in Resolve“, and 0.26 after density modification. 
The phase-and-build routine of Phenix was able to trace 296 residues from this 
map, resulting in a model with Rgree and Reactor of 0.41 and 0.39, respectively 
(Extended Data Fig. 2b). Note that we had to optimize the parameters of phenix. 
autosol to obtain this model; in particular, we specified that the data were weak, that 
we needed to perform a thorough search, and that we needed to use ‘extreme-dm. 
The FOM of the density-modified map obtained after running this script was 
0.72. From this map, phenix.autobuild was in 30 cycles able to reconstruct 501 
residues, of which 183 were placed, corresponding to Rfree and Reactor of 0.35 and 
0.32, respectively. 

This model and the corresponding phases were then input to a first round of 
Arp/Warp™, fitting 62 additional residues into the BinA and BinB models (518 
residues, 245 placed) (Extended Data Fig. 2b). This experimental but incomplete 
BinAB model displayed Reree and Reactor Of 0.32 and 0.29, respectively. The MIRAS 
model then underwent another cycle of model building in phenix.autobuild and 
then a second round of Arp/Warp, resulting in a model with 630 residues, of which 
450 were putatively in sequence. This automatically fitted model was, however, 
unsatisfactory in terms of Rerees Rfactor and goodness of fit of the placed residues 
in the experimental map. Manual rebuilding was necessary, in order to correct 
wrongly assigned residues and to fit the complete BinAB sequence into the electron 
density maps. In one round of manual fitting, an additional 92 residues were 
built, leading to a total of 722 residues (299 and 423 residues in BinA and BinB, 
respectively) (Fig. 1c and Extended Data Fig. 2c). A cycle of reciprocal space refine- 
ment and yet another cycle of manual rebuilding produced a map which allowed 
us to fit the 59 missing residues of BinA. The final model displays Réree and Rfactor 
of 0.200 and 0.164, respectively (Extended Data Table 1), with 96.5% of residues 
in favoured regions of the Ramachandran plot (0.5% Ramachandran outliers, 
1.2% rotamer outliers). Of note, we could also phase and build the structure using 
MIRAS phases of all derivatives, that is, without excluding the isomorphous 
differences from Gd, but the initial model turned out to be less complete, further 
highlighting the higher quality of the SAD phases of the gadolinium derivative, as 
compared to SIRAS (Extended Data Fig. 2b). 

Determination of pH 7 structure by molecular replacement. Molecular replace- 
ment for BinAB (strain 2362) was performed with Phaser* using as a starting 
model the activated BinB structure® (PDB ID: 3WA1) from L. sphaericus strain 
2297. BinB from the two strains differ in only five amino acids (A104S, R267K, 
L314Y, F317L, L389M). This search model became available only after we had 
begun our search for heavy atom derivatives. BinB shares only 28% identity with 
BinA. Two copies of BinB were found (corresponding to a solvent content of 68%), 
one of which indeed corresponded to BinB, but the other to BinA. Corresponding 


residues were mutated and fit manually into the electron density maps using 
Coot*®. After each cycle of model rebuilding, reciprocal space refinement 
(including refinement of coordinates and atomic displacement parameters) was 
carried out using phenix.refine"!. The last cycle of refinement was performed using 
Buster“ and included TLS refinement. The Rwork and Réree of the final model were 
15.8 and 20.3%, respectively, at a resolution of 2.25 A. Note that residues 1-11 
of BinA and residues 1-34 of BinB were disordered in the electron density map 
and therefore not included in the model. This model is nearly identical to the 
independently built and refined MIRAS model (0.102 A r.m.s.d. over 773 aligned 
a-carbons; see Supplementary Table 14). The BinAB model obtained by molecular 
replacement was never used in building the BinAB model to the experimental 
map. In fact, the two models were built independently by different authors. The 
similarity between the independently obtained models provided additional 
assurance of model accuracy. 

Structure determination of BinAB at pH 5 and 10. We obtained experimental 
insights into pH-induced conformational changes by calculating structure factor 
amplitude Fourier difference maps (F,—F,) between the pH 5, pH 7 and pH 10 
data sets. To improve the estimate of structure factor amplitude differences, F,—F, 
maps were q-weighted as described** and produced using a CNS” custom-written 
script®’. Application of the q-weighting scheme to the diffraction data sets was 
essential to eliminate noise and amplify the difference signal. Figure 4 shows the 
F.PH10_ PH? map (Riso = 0.28) phased with the pH 7 model (\p?4”) and highlights 
four regions that are highly sensitive to pH elevation (Fig. 4c-f). Extended Data 
Fig. 8 shows, for each of these regions (Extended Data Fig. 8a—d) and for the 
two BinAB disulfides (Extended Data Fig. 8e, f), the six possible F.P/—F,PM, 
pP4! maps calculated from the pH 5, pH 7 and pH 10 data sets and structures. 
Supplementary Tables 11 and 12, respectively, list all BinA and BinB residues that 
feature peaks higher than + 3.50 and larger than 2 voxels in the F,P"10—F,P47, 
PH” map. Sequence-wise integrations of the F,PH10_ pF PH? PH? and FHF PH”, 
P47 maps around BinA and BinB are given in Extended Data Fig. 8g and h, 
respectively. 

We phased the pH 5 and pH 10 data sets by molecular replacement with Phaser* 
using as a starting model the refined MIRAS (pH 7) structure. Conformational 
changes with respect to the latter were modelled manually. Reciprocal space refine- 
ment was performed using phenix.refine and included positional, B-factor, TLS 
and occupancy refinements. Ryork and Rie of the final pH 5 model are 0.211 and 
0.262, respectively, at a resolution of 2.5 A, with 97.6% of residues in favoured 
region of the Ramachandran plot (0.4% Ramachandran outliers, 1.5% rotamer 
outliers). Rwork and Reree of the final pH 10 model are 0.165 and 0.211, respectively, 
at a resolution of 2.4 A, with 98.9% of residues in favoured region of the 
Ramachandran plot (0.0% Ramachandran outliers, 0.0% rotamer outliers). 
Difference distance matrices (DDMs) are presented in Extended Data Fig. 9. 
F,.—F, map integration and DDM calculations were performed using custom- 
written scripts. 

Electrostatic potential maps. pK, values were assigned using PROPKA®! and 
structures protonated in the AMBER force field at pH 7.5 and pH 10.5 using 
PDB2PQR™. Figures were produced using PyMOL*?. 
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Extended Data Figure 1 | See next page for caption. 
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Extended Data Figure 1 | Data collection and heavy-atom substructure 
determinations. a, The ‘MESH-on-a-stick sample injector configuration 
(see Methods). In the three panels, the yellow X below the capillary 
indicates the X-ray path into the page. The middle panel shows a 

closer view of the injector tip; the right panel shows an on-axis view 

of the sample injecting during the experiment. b, Our choice of X-ray 
wavelength for diffraction and MIRAS phasing was a compromise between 
maximizing the heavy atom anomalous signals, f”, as indicated by the 
curves for each element, and maximizing the number of data sets collected 
in the time allotted for the experiment. The grey bar corresponds to the 
wavelength we used, 1.41 A. c, Difference Patterson maps calculated at 

2.8 A resolution. Sharpening (—5.0 A?) was applied to Hg and VIL maps. 
Coefficients for the PCMBS and VIL maps were obtained from both 
isomorphous and anomalous differences. The Gd difference Patterson 
map was calculated from anomalous differences only. Contours start at the 
1.50 level and continue at 0.5¢ intervals. Peaks corresponding to vectors 
between heavy atoms stand out as high peaks, up to 7.50. d, Heavy-atom 
sites were located successfully for each of the three derivatives using the 
program SHELXD. We compared the quality of potential heavy-atom 
substructure solutions obtained from two sources of heavy-atom signal: 
single wavelength anomalous dispersion (SAD, red) and a combination 

of anomalous dispersion and isomorphous differences (SIRAS, blue). 
Ten-thousand independent trials were performed for each derivative 

and signal source. Each dot in the scatter plots indicates the quality of an 
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individual substructure solution. The vertical axis, labelled CC, indicates 
the consistency between the potential solution and the diffraction data as 
the correlation coefficient between normalized structure factors, E-a), and 
Epps. The horizontal axis, labelled PATFOM (Patterson figure of merit), 
indicates the consistency between the observed difference Patterson map 
and that predicted by the potential solution. Successful substructure 
determination is suggested by the appearance of a sharp separation 
between two populations of potential solutions: a cluster with lower values 
of CC, and PATFOM (incorrect solutions) and a cluster with higher 
values (correct solutions). Such is the case for all the trials performed, 
except for VIL using the SAD signal, where only a single population 

of solutions is observed. Evidently, the SAD signal was insufficient for 
accurate location of iodine sites. For VIL, we relied on the accuracy of 
sites obtained from the SIRAS signal. In most cases, the SIRAS (blue) 
signal is stronger than the SAD signal (red), indicating good isomorphism 
between native and derivative data sets. Only in the case of GdCl; does the 
SAD signal appear to be better than the SIRAS signal. The histograms in 
the right column indicate the number of potential substructure solutions 
with given values of CFOM (combined figure of merit). The histograms 
recapitulate the trends observed in the scatter plots. e, The correlation 
coefficient (CCjs.) measures the agreement and R;,, measures the 
discrepancy between the native structure factors and those of each of the 
derivatives. Each of our three derivative data sets shows isomorphism with 
the native data set up to 2.8 A resolution. 
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igure 2 | See next page for caption. 
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Extended Data F 


Extended Data Figure 2 | Structure solution and model building. 

a, Evidence for choosing the correct hand of heavy atom substructures. 
We illustrate here the two types of comparison we used for choosing the 
correct hand of the heavy-atom substructures: (1) comparisons of the 
quality of the SIRAS and SAD phased maps (upper three panels) and 

(2) comparisons of the heights of anomalous difference Fourier peaks 
(lower three panels). These comparisons are made between maps 
calculated in opposite hands; the correct hand is indicated by the 
individual with higher positive values. The disparity in values (A) is 
indicated on the vertical axes of the graphs. Greater | A| values indicate 

a stronger phasing power and more reliable choice of hand. There are 

six comparisons shown for each of the three heavy derivatives: PCMBS 
(mercury) in red, GdCl; (gadolinium) in blue, and VIL (iodide) in green. 
The top three panels illustrate the percent difference between hands in the 
mean figure of merit (Mean-FOM), the pseudo-free correlation coefficient 
(Pseudo-free CC) of the density-modified map, and the correlation 
coefficient of the trace (Trace-CC) as reported by ShelxE. The sites and 
phases were obtained from SIRAS signal for mercury and iodide, and 
from SAD signal for Gd. The most probable solvent content is ~59%, 
corresponding to one BinAB complex per asymmetric unit. However, 

we note conflicting choices of hand indicated by fluctuations in the sign 
of A accompanying small variations in the solvent content used in the 
density modification step (horizontal axes). We found that the difference 
Fourier maps (lower three panels) offered a stronger and more consistent 
indication of the choice of hand even when the statistics from SIRAS 

and SAD phased maps themselves differed little between hands. In these 
panels, SIRAS phases from the each heavy atom (3 columns) were used to 
compute three anomalous difference Fourier maps, using as coefficients, 
the anomalous differences from each derivative (rows 4-6). The value 

A corresponds to the height of the highest peak in the map computed in 
the original hand minus the corresponding peak in the map computed 

in the inverted hand. The graphs show that for all three derivatives, the 
original hand choice was correct (indicated by positive A), consistent 
across choices of solvent content (all A have the same sign within a 
graph), and consistent across sources of anomalous differences (all 

A have the same sign within a column). b, Automated tracing and 
model-building. Phases from the three derivatives were combined 
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using SOLVE“ (see Methods). The hands which we decided on during 
the phasing step (a) were specified as ‘known’ to phenix.autosol. We then 
used RESOLVE" (see Methods) to trace the density and build a model. 
The upper and lower panels show the progress of model building, 
depending on whether anomalous and isomorphous differences were 
combined from all derivatives (upper panel) or mixed phases (that is, 
anomalous and isomorphous differences from PCMBS and VIL, and 
anomalous differences from the Gd derivative) (lower panel). Each panel 
shows scatter plots of Rérees Rfactor number of residues built and number of 
residues placed in sequence as a function of the number of cycles. The use 
of mixed phases allowed us to obtain a better model, faster (lower panel). 
c, Electron density maps at various stages of model building. From left to 
right, the panels illustrate progressive improvement in map and model at 
two representative regions of BinAB (upper versus lower panels). 

The number of residues built (including residues without side chains) 

is noted at each stage, as well as the number of protein atoms built. 

The quality of the maps at each stage is reported as a correlation 
coefficient with the map obtained from the final model. Approximately 
60% of the total atoms in BinAB were built automatically. d, Comparison 
of BinA and BinB structures. Superposition of BinA (lighter colours) 

and BinB (darker colours) shows similarity between molecules, which 
superimpose with an r.m.s.d. of 1.7 A over 329 pairs of a-carbons. 

The ‘face’ view displays the surface involved in the BinA-BinB dimer 
interface, and the barrel subdomain of the trefoil is oriented towards the 
viewer. The back view displays the outward faces of the molecules, with the 
putative carbohydrate binding modules, in the cap subdomain, oriented 
towards the viewer. One of the largest structural differences is located in 

a surface loop on the back face, in the trefoil domain (blue). In BinB, a 
disulfide bond (Cys67—Cys161, yellow sticks) pins a surface loop (residues 
60-74) away from the opening in the trefoil domain (open), whereas in 
BinA, the analogous loop (residues 34-46) is stabilized by a different 
disulfide bond (Cys31-Cys47, yellow sticks) to take a conformation 

that covers the opening in the trefoil domain (closed). e, Structure- 

based sequence alignment of BinA, BinB, and cry35Ab1. The secondary 
structures of BinA and BinB are shown above the sequences. Heterodimer 
contacts and cleavage sites are noted. 
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Extended Data Figure 3 | Trefoil domains of BinA and BinB. 

a, Structural relationships among trefoil domains illustrated by a 
phylogenetic tree plot. Four of the structures used for comparison 

were identified from a structural similarity search through the Protein 
Data Bank conducted by the Dali server (using BinA residues 6-156 

as the probe). The top four hits occupy the top half of the plot (3AH1, 
3VT2, 2E4M, and 4JPO) and include the deadly toxin, ricin (3VT2). The 
remaining structures chosen for comparison (1W3G and 3ZXG) were 
selected based on their membership in the aerolysin family of toxins, of 
which BinA and BinB are members. That is, these are trefoils covalently 
linked to aerolysin-type pore-forming domains. These are highlighted in 
blue text and include another insecticidal protein from B. thuringiensis, 
Cry35Ab1 (4JP0). Note that BinA and BinB are nearly as distant from 
each other as they are from the closest homologues, haemagglutinin, 
ricin, and Cry35Ab1. Carbohydrate molecules are shown in sticks where 
coordinates are available. Notable loop insertions in BinA and BinB are 
coloured in orange and magenta, respectively. b, Carbohydrate-binding 
modules of BinA and BinB display different levels of structural integrity. 
No carbohydrates were included or observed in the crystals structure of 
BinAB. To investigate the structural integrity of the putative carbohydrate- 
binding pockets of BinAB, we superimposed coordinates of lectin (1W3G) 
and haemagglutinin (3AH1). The crystal structures illustrated in the left 
column are carbohydrate complexes chosen for their structural similarity 
to BinAB. Some modules appear competent for carbohydrate binding, 
such as the B- and y-modules of BinA and the 8-module of BinB. Others 
show steric clash (yellow starburst), such as the a-module of BinA and the 
6-module of BinB, which could be overcome by allowing adjustments in 
torsion angles. Notably, the «-module of BinB is completely occluded by 
the insertion in its sequence (magenta) and stapled shut by a disulphide 
bond. In addition to the canonical a-, 3- and y-binding modules, 3AH1 
displays another weakly bound carbohydrate marked site IIIA (bottom 
panel). This site is illustrated here because its superimposed coordinates 
lie adjacent to Y150 in BinB. The Y150A mutation causes complete loss of 
receptor binding™*. 
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Extended Data Figure 4 | See next page for caption. 
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Extended Data Figure 4 | Pore-forming domains (PFD) of BinA 

and BinB. a, Topology of the aerolysin family of pore-forming toxins. 
These share a core topology composed of five antiparallel 3-strands 

and a putative membrane-spanning segment (green). PDB ID codes are 
included in parentheses. For clarity, we exclude from this illustration 

any accessory domains outside the pore-forming module (PFM) of these 
toxins. The PFM is divided into two subdomains: a 3-sheet subdomain at 
one end (above the horizontal grey line) and a 3-sandwich subdomain at 
the opposite end (below the horizontal grey line). The length, twist, and 
number of strands vary between toxins. Also, the putative membrane- 
spanning segment (green) varies widely in secondary structure. However, 
in all cases this putative membrane-spanning segment is located between 
the second and third strands, suggesting that these toxins might share 

a common mechanism of pore formation. b, Members of the aerolysin 


family that also contain a -trefoil domain like BinAB. These are: 
Cry35Ab1 toxin from B. thuringiensis (4jp0)°’, lysenin, a haemolytic toxin 
from the earthworm Eisenia fetida (3zxg)°*, and a pore-forming lectin 
from the mushroom Laetiporus suphureus (1w3g)*°. c, Amphipathicity 

is evident in the sequence of the putative transmembrane (TM) 
subdomains of BinA and BinB. The observed secondary structures of 
BinA and BinB are shown above the sequence alignment. The range of the 
transmembrane subdomain is coloured yellow. Amino acids are coloured 
by hydrophobicity according to the scale given at the bottom. Note the 
alternating hydrophobic-hydrophilic pattern is especially prominent in 
the N-terminal half of the transmembrane subdomain. This pattern is 
consistent with the proposal of an oligomeric membrane-spanning 
8-barrel. The figure was made using the program Jalview™. 
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Extended Data Figure 5 | See next page for caption. 
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Extended Data Figure 5 | Overview and analysis of molecular interfaces 
in the BinAB crystal. a, Overview of the six molecular interfaces involving 
BinA in the BinAB crystal. The reference copy of the BinA molecule is 
depicted as a beige molecular surface and its six neighbouring molecules 
are shown as cartoon ribbons (upper panels). Face and back views (left and 
right panels) reveal opposite surfaces of the BinA molecule. The largest 
interface is with BinB (x,y,z) which is shown most clearly in the face view 
(left panels) in dark blue. It is the only interface of the six that is large 
enough to stretch over most of the length of the molecule. In all views, the 
pseudo-two-fold axis relating BinA and BinB is in a vertical orientation 
(black line in upper panels). The areas of contact are illustrated on the 
BinA molecular surface (middle panels) in colours corresponding to the 
cartoon ribbons (upper panels). BinA molecules and surfaces are shown in 
beige shades; BinB molecules and surfaces are shown in blue-green shades. 
The pie chart shows the relative amount of total BinA surface area buried 
by each of the six crystal contacts and the remainder, which is solvent 
exposed. b, Overview of the eight molecular interfaces involving BinB in 
the BinAB crystal. The reference copy of the BinB molecule is depicted as 
a dark blue molecular surface and its eight neighbouring molecules are 
shown as cartoon ribbons (upper panels). Face and back views (left and 
right panels) reveal opposite surfaces of the BinB molecule. The largest 
interface is with BinA (x,y,z), shown most clearly in the face view 

(left panels) in beige. It is the only interface of the eight that is large 


enough to stretch over most of the length of the molecule. In all views, 
the pseudo-two-fold axis relating BinA and BinB is in a vertical orientation 
(black line in upper panels). The areas of contact are illustrated on the 
BinA molecular surface (middle panels) in colours corresponding to the 
cartoon ribbons (upper panels). BinA molecules and surfaces are shown 
in amber shades; BinB molecules and surfaces are shown in blue-green 
shades. The pie chart shows the relative amounts of total BinB surface area 
buried by each of the eight crystal contacts and the remainder, which is 
solvent exposed. c, Distribution of the BinA-BinB interface area over its 
subdomains. The pie charts in the upper half show the area contributions 
to the principal BinA—BinB interface from each of the five named regions: 
trefoil domain, transmembrane subdomain, sheet subdomain, sandwich 
subdomain, and combined N- and C-terminal propeptides. The lower 
charts show analogous contributions on a per-residue basis. That is, 

the area contributed by each region is divided by the total number of 
residues comprising that region. These pie charts emphasize the role of 
the transmembrane subdomain in the dimer interface, perhaps to restrain 
this subdomain from inserting into a membrane until after the BinAB 
dimer dissociates. Notably, the higher efficiency of pore formation of 
BinA compared to BinB” correlates with the greater protection of its 
transmembrane domain (12.5 A? buried per residue versus 6.5 A? buried 
per residue) in the dimer. 
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Extended Data Figure 6 | See next page for caption. 
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Extended Data Figure 6 | Detailed views of the molecular interfaces 
in the BinAB crystal. a~g, BinA and BinB are shown as green and 

cyan ribbon diagrams, respectively. The C-terminal propeptide of BinB 
(residues 396-448) is highlighted in blue, while the N-terminal (residues 
1-10) and C-terminal (residues 354-367) propeptides of BinA are shown 
in dark green. Contacting residues are shown as sticks. Polar interactions 
within a 3.6-A cut-off are highlighted by yellow dashes. The contacts 
illustrated in panels ag are detailed in Supplementary Tables 3-9, 
respectively. a, Molecular contacts between BinA (x,y,z) (green) and 
BinB (x,y,z) (cyan), that is, within the biological dimer. A large part of 
this interface involves the C-terminal propeptide of BinB. b, Molecular 


contacts between BinA (x,y,z) (green) and BinA (x + 1/2, —y + 1/2, —z) 
(lime green). c, Molecular contacts between BinA (x,y,z) (green) and BinB 
(x — 1/2, —y + 1/2, —z) (cyan). This interface involves the propeptide 

of BinB (residues 396-448). d, Molecular contacts between BinA (x,y,z) 
(green) and BinB (—x + 2, y— 1/2, —z + 1/2) (cyan). This interface 
involves the propeptide of BinB (residues 396-448). e, Molecular contacts 
between BinA (x,y,z) (green) and BinB (—x + 5/2, —y, —z + 1/2) (cyan). 
f, Molecular contacts between BinB (x,y,z) (cyan) and BinB (—x + 2, 

y— 1/2, —z+ 1/2) (teal). A small part of this interface involves the 
propeptide of BinB (residues 396-448). g, Molecular contacts between 
BinB (x,y,z) (cyan) and BinB (x, y — 1, z) (teal). 
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Extended Data Figure 7 | See next page for caption. 
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Extended Data Figure 7 | Electrostatic complementarity, tyrosine 
distribution, predicted electrostatic changes upon pH elevation and 
crystal solubilisation assays. a, b, Electrostatic surface complementarity of 
the BinA-BinB interface. At pH 7.0 (a), complementary charges are 
notable between the BinA electrostatic surface potential (top) and the 
BinB electrostatic surface potential (bottom). The complementarity in 
potential is highlighted by the vertical arrows connecting adjacent patches 
on opposing surfaces of the interface. At pH 10.5 (b), deprotonation 

of tyrosine and increased negative charge on acid residues causes a 
reduction in electrostatic complementarity from 0.37 to 0.29 

(ref. 60). All panels depict the BinA surface of the BinAB dimer interface. 
In the upper panels of a and b, this surface is coloured by electrostatic 
surface potential of BinA; in the lower panels, this surface is coloured 

by electrostatic surface potential of BinB. Residues lining the interface 
(sticks) are labelled with colour corresponding to the domain to which 

it belongs. The colour scheme is as described in Extended Data Fig. 2a. 
BinA residues are labelled in the upper panel. BinB residues are labelled 
in the lower panel. In all panels, the pseudo-two-fold axis relating BinA 
and BinB is in a vertical orientation (black line in Fig. 3a, lower panel). 

c, Distribution of tyrosine residues in the BinAB dimer. Of the total 49 
tyrosine residues, 48 are ordered in the crystal structure. Of these, 20% 
are located in the dimer interface, which itself accounts for only 10% of 
the total molecular surface. Thus, the distribution of tyrosine residues 

is slightly more concentrated on the dimer interface compared to the 
remainder of the BinAB surface. Tyrosines outside the dimer interface 
are probably more prone to deprotonation than those within the dimer 
interface, due to differences in solvent accessibility. d, e, Electrostatic 


potential map of BinA and BinB. The surface of the BinAB dimer is 
depicted coloured by the electrostatic surface potential of BinA on the left, 
and by that of BinB on the right. In d, the regions of BinA (left) and BinB 
(right) that participate in the dimer interface are highlighted. In e, the 
external surface of the dimer is highlighted. In both d and e, the upper and 
lower panels show the electrostatic surface potentials of BinA (left) and 
BinB (right) at pH 7 and pH 10.5, respectively. f, Alkaline-induced crystal 
dissolution is delayed for the BinA D22N mutant compared to the wild 
type. Our structural data suggested that BinA Asp22 was an important pH 
sensor for triggering crystal dissolution at the high pH characteristic of 
the mosquito midgut. We reasoned that a D22N mutation in BinA would 
render the crystal less sensitive to pH by stabilizing a hydrogen bond with 
the BinB C-terminal carboxylate. We constructed a BinA D22N mutant 
and measured solubility of BinA D22N-BinB crystals at pH 10, collecting 
three data points for each time point. We found that its solubility in vitro 
decreased by 30% between 30 and 90 min at pH 10 compared to wild-type 
crystals, but not at pH 7 (individual measurements are plotted to indicate 
the range of variation). After 90 min, crystals of wild-type BinAB and BinA 
D22N-BinB are completely dissolved. This delay in crystal dissolution up 
to the 90-min time point is an important difference because in the fourth- 
instar Culex larvae, the larval feeding rate from the time particles are 
ingested until they are digested and exit the hindgut is 30 min (indicated 
by grey shading). Hence, the 60-min delay that we see in our experiments 
with D22N is long enough to contribute to the striking loss of toxicity 

of more than 20-fold at the LC95 level (Supplementary Table 13). These 
results are consistent with the model of Asp22 serving as a pH sensor for 
crystallization. 
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Extended Data Figure 8 | See next page for caption. 
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Extended Data Figure 8 | Comparison of F,?/— F,P"/ maps obtained 
from crystals receiving different X-ray doses suggests that the 
structural changes observed are due to pH change and not radiation 
damage. a—f, The pH 5 and pH 10 data sets were collected with a 
~500-fold higher dose than the pH 7 data set, raising the concern that 
some of the peaks in the F,P#!°—F,P4”, pPH7 map result from radiation 
damage, most notably those observed on disulfides. Panels a-d show, for 
each of the four regions identified as highly sensitive to pH elevation in 
Fig. 4c-f, the six possible F,P4/—F,P#, yP#! maps calculated from the 

pH 5, pH 7 and pH 10 data sets and structures. Panels e and f show these 
maps around the disulfides of BinA (e) and BinB (f). BinA and BinB are 
shown as cartoons, coloured by subdomain, as in Fig. 4. The cartoons 
range in colour from pale to medium to dark, signifying the pH values 
5,7, and 10, respectively. Consistent with the hypothesis that a 500-fold 
difference in dose causes no major structural change, we see a lack of 
peaks in the F,P#5—F,P#7, P47 map (Riso = 0.26) around disulfides (e, f) 
and other pH-sensitive residues (a—d). Consistent with the hypothesis that 
the peaks observed in the F,P#!0—F,P#7, PH” map (Riso = 0.23) are caused 
by pH change, these peaks are reproduced in the FoP#°— F.P45, PHS and 
F,PHS_FPH10, PH10 mans (Riso = 0.35). We interpret this pattern of peaks 


as implying movement of the disulfide bonds rather than their disruption. 
This movement accompanies pH-sensitive rigid-body motion of the trefoil 
domains (Extended Data Fig. 9). g, h, Peaks stronger than + 3.50 were 
integrated contiguously in the F,?!™"°—F,P"7, yPM” (upper panels) and 
F.PHS— F.PH7, pPH7 maps (lower panels) and then assigned to the closest 
residue. The secondary structures of BinA (g) and BinB (h) are shown 

as cartoons, coloured by subdomain as in Fig. 4. The background of the 
sequence is also coloured by subdomain. The sequence-wise integration 
of the F,PH#1°—F,PH7, pPH7 map reveals that BinB is more affected by the 

pH elevation than BinA, and in both chains, the trefoil is more affected 
than the pore-forming domain. The propeptide and transmembrane 
regions of both proteins are also sensitive to pH elevation. Peaks in the 
F.PH5_F,PH7, pPH7 map (lower panels) are smaller in magnitude and 
concentrated in the trefoil domain of BinA and the C-terminal propeptide 
of BinB. They correspond to side-chain reorientation rather than increased 
dynamics or domain motion (Extended Data Fig. 9). The marked 
difference in pattern between the F,P#!0—F,PH7, PH” and F,PH5— PH’, 
yP'” map integrations is consistent with the hypothesis that the peaks 
observed in these maps are not due to radiation damage, but rather to 
pH-induced conformational changes. 
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Extended Data Figure 9 | See next page for caption. 
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Extended Data Figure 9 | Conformational changes in the BinAB dimer 
upon pH elevation from 7 to 10. a—d, Distance difference matrices 
(DDMs) calculated between the pH 7 (reference) structure and either the 
pH 10 or the pH 5 structure. Blue and red indicate decreases and increases 
in Ca-Ca distances in the pH 10 or pH 5 structures as compared to the 
pH 7 structure, respectively. The secondary structures of BinA (a-c) and 
BinB (a, b, d) are recapitulated by cartoons on the side or the diagonal of 
the DDMs. These cartoons are coloured by subdomain as in Fig. 4. 

a, Intermolecular (BinA versus BinB) DDM between the pH 10 and the 
pH7 structures. This DDM illustrates that the BinAB dimer contracts 
upon pH elevation, with the two trefoil domains coming closer to one 
another. This might be due to electrostatic repulsion at crystal contact 
zone 5 (Fig. 4e, Extended Data Fig. 6e and Supplementary Table 7), which 
involves the trefoils of BinA and BinB from two symmetry-related dimers. 
b, Intermolecular (BinA versus BinB) DDM between the pH 5 and the 
pH7 structures. The pH 5 structure is overall slightly more compact than 
the pH 7 structure but shows no major conformational changes. 

c, d, Intramolecular DDMs of BinA (c) and BinB (d). Changes in Ca-Ca 
distances between the pH 10 and the pH 7 structures are reported below 
the diagonal, while those between the pH 7 and the pH 5 structures 

are shown above the diagonal. The pH 5 and pH 7 structures of BinA 


(c) and BinB (d) are overall similar, with only the BinA loop Ile110- 
Arg120 and BinB loop Lys175-Ser184 showing a noticeable difference in 
conformation. In contrast, the pH 10 structures of BinA (c) and BinB (d) 
appear more compact. On the local level, striking conformational changes 
are observed upon pH elevation in the N-terminal propeptide of BinA, 

in loops Ile110-Thr120 (trefoil) and Asn341-Tyr345 (PFD) of BinA, and 
in loop Lys175-Ser184 (trefoil) of BinB. The increase in compactness 

is due to the trefoil domain coming closer to the PFD in both BinA and 
BinB. BinA loop Ile110-Thr120 appears sensitive to both increases and 
decreases in pH. e, f, Porcupine plots depicting differences between 
structures of BinAB for pH 7 versus pH 5 (green arrows) and pH 7 versus 
pH 10 (red arrows). The pH 7 structure of BinAB is shown, coloured 

by subdomain as in Fig. 4. The movement of Ca atoms is indicated by 
arrows on the ribbon representation, with the magnitude of motions 
illustrated by length of arrows exaggerated by 2.5 A to increase visibility 
(for all atoms that move by more than 0.1 A). e, View of the BinAB dimer, 
in an orientation similar to Fig. 4b. As compared to Fig. 3a, b, this view is 
rotated by 180° around the vertical axis. f, View from the top of the trefoil 
domains; this face of the BinAB dimer is presumably that interacting with 
the apical membrane of larvae midgut cells. The view in f is 90° apart from 
that ine. 
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Native, pH 7 PCMBS?> Gdc VIL4 Native, pH 10 Native, pH 5 
5FOY@ 5FOZ* 5G37! 
Data collection 
Space group P212121 P212121 P212121 P212121 P212121 P212121 
Cell dimensions 
a, 'D;,C (A) 86.9, 97.4, 87.1, 972, 87.0, 97.5, 86.7, 97.3, 86.7, 97.3, 86.8, 97.0, 
128.3 128.4 128.1 127.7 127.7 127.0 
a, By (°) 90, 90, 90 90, 90, 90 90, 90, 90 90, 90, 90 90, 90, 90 90, 90, 90 
Wavelength (A) 1.212 1.408 1.408 1.408 1.459 1.459 
X-ray beam focus 1.3 3. 1.3 1.3 0.25 0.25 
(um) 
Photons/pulse (x 1.9 7.2 7.2 7.2 6.4 6.4 
1011) 
Pulse duration (fs) 31.9 45.1 45.1 45.1 40.8 40.8 
Absorbed dose (GGy)z 0.03 0.15 0.15 0.15 14.4 14.4 
Number of collected 312,659 761,052 431,986 264,770 371,386 493,979 
frames 
Number of indexed 41,206 184,091 131,137 42,537 2h IF 92 17099 
patterns 
Number of indexed 40,794 170,616 123,484 39,789 26,022 16224 
images accepted by 
cctbx.prime 
Resolution (A)h 43.5-2.25 43.5-2.40 43.5-2.35 43.5-2.60 43.5-2.37 43.5-2.50 
(2.29-2.25)  (2.44-2.40) (2.39-2.35) (2.64-2.60) (2.44-2.37) (2.54-2.50) 
Number of 12,854,588 53,361,546 39,314,393 11,249,082 12,069,895 5,495,157 
observationsi 
[ol 2.4 (0.5) 2.8 (0.7) 3.1 (0.6) 3.7 (0.9) 2.7 (0.9) 6.34 (2.46) 
CCi;2 96.8 (26.2) 98.6 (16.2) 98.4 (3.5) 96.3 (17.5) 97.3 (34.1) 86.8 (28.0) 
Completeness (%) 99.7 (99.7) 99.8 (96.9) 99.8 (97.0) 99.4 (96.2) 99.9 (100.0) 99.9 (89.4) 
Multiplicity 65.9 (4.8) 372 (4.3) 260.1 (4.4) 100.4 (4.7) 88.8 (19.2) 76.62 (8.61) 
Refinement 
Refinement target Phased Maximum- Maximum- 
function maximum- likelihood likelihood 
likelihood 
Resolution (A) 225 2.40 2.50 
(2.30 — 2.25) (2.46 — 2.40) (2.56 — 2.50) 


Number of reflections 


Rwork / Riree® 


Number of atoms 


52379 (3523) 


0.164 (0.288) / 
0.200 (0.327) 


42817 (2923) 


0.165 (0.264) / 
0.211 (0.311) 


37785 (2557) 


0.211 (0.357) / 
0.262 (0.427) 


Protein 6479 6350 6415 

Water 571 730 720 
B-factors (A?) 

Protein (BinA/BinB) 47.5 / 40.7 52.8 / 47.3 39.2 / 36.8 

Water 50.8 62.4 41.01 
R.m.s. deviations 

Bond lengths (A) 0.006 0.007 0.002 

Bond angles (°) 1.359 0.8 0.5 


741,206 crystals were used to produce the native, pH 7 data set. 

5184,091 crystals were used to produce the PCMBS data set. 

£131,137 crystals were used to produce the Gd data set. 

442,537 crystals were used to produce the VIL data set. 

°27,792 crystals were used to produce the native, pH 10 data set. 

17,099 crystals where used to produce the native, pH 5 data set. 

£As calculated by Raddose-3D*“. Note that these dose calculations do not yet take into account the escape of photoelectrons from the diffracting volume. The track length in protein and 
water of photoelectrons has been estimated to 3m, for 1 A wavelength X-rays®°, suggesting that most photoelectrons travel outside of the diffracting volume without losing all their energy 
within it (crystals are 0.25 x 0.35 x 0.75 1m? on average). Although these are not the only electrons to contribute to the damage, a preliminary photoelectron escape model incorporated into 
an unreleased version of RADDOSE-3D suggests that the absorbed dose is overestimated by a factor of ~100 (J. Brooks-Bartlett and E. Garman, personal communication) for these small crystals. 
Values in parentheses are for highest-resolution shell. 

‘Including negative intensity observations. 
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Arginine phosphorylation marks proteins 
for degradation by a Clp protease 


Débora Broch Trentini!*, Marcin Jozef Suskiewicz!*, Alexander Heuck!, Robert Kurzbauer!, Luiza Deszcz!, Karl Mechtler!? & 


Tim Clausen! 


Protein turnover is a tightly controlled process that is crucial for the removal of aberrant polypeptides and for cellular 
signalling. Whereas ubiquitin marks eukaryotic proteins for proteasomal degradation, a general tagging system for 
the equivalent bacterial Clp proteases is not known. Here we describe the targeting mechanism of the ClpC-ClpP 
proteolytic complex from Bacillus subtilis. Quantitative affinity proteomics using a ClpP-trapping mutant show that 
proteins phosphorylated on arginine residues are selectively targeted to ClpC-ClpP. In vitro reconstitution experiments 
demonstrate that arginine phosphorylation by the McsB kinase is required and sufficient for the degradation of substrate 
proteins. The docking site for phosphoarginine is located in the amino-terminal domain of the ClpC ATPase, as resolved 
at high resolution in a co-crystal structure. Together, our data demonstrate that phosphoarginine functions as a bona 
fide degradation tag for the ClpC-ClpP protease. This system, which is widely distributed across Gram-positive bacteria, 
is functionally analogous to the eukaryotic ubiquitin-proteasome system. 


Proteins destined for degradation are removed by energy-dependent 
proteases such as the eukaryotic proteasome or the bacterial Clp 
proteolytic complexes’. In these tightly regulated protein shredders, the 
proteolytic sites are sequestered within an inner chamber that is only 
accessible through axial entrance gates”. The gates are in turn guarded 
by regulatory AAA (ATPases associated with diverse cellular activities) 
complexes that are responsible for recognizing substrate proteins as 
well as for unfolding and translocating them into the protease cage*"*. 
In eukaryotes, substrate selection depends on a range of ubiquitin ligases 
that mark substrates with a polyubiquitin tag, a degradation signal 
recognized by the AAA regulatory particle of the 26S proteasome™®. 
An analogous system involving the protein modifier Pup targets sub- 
strates to the eukaryotic-like core proteasome present in mycobacteria 
and closely related species”, However, it is not known whether the 
ATP-dependent Clp proteases, which are found in almost all bacteria, 
require a general post-translational tagging system. The Clp complexes 
are thought to use the N-terminal domains (NTD) of the AAA ATPases 
(ClpA, ClpC, ClpE or ClpX) to recognize specific degradation motifs, 
known as degrons, which are typically located at the N- or C-terminal 
ends of target proteins® !!. These degrons can also be introduced by the 
specialized SsrA tagging system, which is used for rescuing stalled ribo- 
somes!”. Alternatively, substrate recruitment may be aided by adaptor 
proteins that tether selected substrate proteins to the Clp proteolytic 
complex, thus facilitating their degradation’. 

In B. subtilis and other Gram-positive bacteria, the ClpC-ClpP 
(ClpCP) protease, which is constituted by the AAA unfoldase ClpC and 
the protease ClpP, is an important proteolytic machine for eliminating 
unfolded and aggregated proteins. The ClpCP proteolytic complex is 
under the control of McsB, the founding member of a class of protein 
kinases targeting arginine residues'*. First, McsB controls the amounts 
of ClpCP in the cell by phosphorylating and inhibiting the transcrip- 
tional repressor CtsR'?"!*, which in turn regulates clpC and clpP gene 
expression!®. Second, McsB has been reported to function as an adap- 
tor protein of ClpC by stimulating its ATPase activity and promoting 
degradation of the CstR substrate!”'*. In addition to regulating CtsR 


and ClpC, McsB phosphorylates hundreds of diverse proteins in vivo, 
as revealed by B. subtilis phosphoproteomic analyses!>'?. This promis- 
cuous activity suggested a more general function of the protein arginine 
kinase in the stress response of Gram-positive bacteria. 


ClpCP degrades pArg proteins in vivo 

Arginine residues are frequently observed at molecular interfaces 
crucial for protein folding and assembly”. Therefore, arginine phospho- 
rylation, resulting in a net-charge inversion, is predicted to have a strong 
effect on protein stability. Of note, the kinase catalysing this reaction, 
McsB, has many substrates in vivo and is transcriptionally co-regulated 
with ClpP, the major protease of B. subtilis. We thus proposed that 
arginine phosphorylation may have a direct role in the degradation 
of aberrant proteins. To test this assumption, we monitored the fate 
of phosphoarginine (pArg) proteins in vivo by expressing an inactive 
trapping variant of the ClpP protease (Ser98Ala, ClpP'®*; refs 10, 21). 
Substrates captured within the protease cage can be co-purified and 
analysed by mass spectrometry (MS). To perform the pull-down exper- 
iments in the wild-type B. subtilis background, we engineered a ClpP 
mutant that does not interact with the endogenous, active protease. For 
this purpose, we exchanged residues of an ion pair at the interface of the 
ClpP heptamer. The resulting cross mutant (Glul119Arg/Arg142Glu, 
ClpP*) did not form heterooligomers with wild-type ClpP, but main- 
tained the ability to assemble a substrate-trapping cage (Extended Data 
Fig. 1). 

Quantitative MS analysis of ClpP pull-downs from heat-shocked 
bacteria (Extended Data Fig. 2a, b, strategy illustrated in Fig. la) revealeda 
large number of proteins that were specifically captured by the ClpP* 1“? 
mutant (Fig. 1b and Supplementary Table 1). Despite the technical diffi- 
culties in identifying arginine phosphorylations’’, we detected 13 pArg 
proteins among the 233 isolated ClpP substrates (Extended Data Table 1, 
Supplementary Table 2). Taking into account the functional connec- 
tion between McsB and ClpC, which are found in the same operon, we 
next asked whether pArg proteins are transferred into the ClpP cage by 
ClpC. To this end, we performed ClpP*T®4? pull-down analyses from 
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Figure 1 | Pull-down of ClpP trapping mutants. a, Cartoon illustrating 

the experimental workflow. As indicated, all pull-down experiments were 

done in triplicates. LC-MS/MS, liquid chromatography tandem mass 

spectrometry. b, Volcano plot illustrating proteins identified in ClpP* 

(control) and ClpP*"T®4? pull-downs after expression in a B. subtilis 

wild-type (WT) strain. Proteins were considered as ClpP substrates 

(shaded area) when the X-TRAP/control relative protein intensity 

(x axis) was >2 and the corresponding limma P value (y axis) was 

<0.05. Phosphorylated proteins are shown as filled squares (red: pArg, 

blue: pSer/Thr/Tyr). In a few cases, the phosphorylated residue could not be 

unambiguously localized. As the same phosphopeptides have been observed 

to contain a pArg in previous experiments, they are labelled as probably 

pArg (open red squares). Identified pArg proteins are listed on the left. 

c, Volcano plots of the ClpP pull-downs performed in B. subtilis wild-type 

and AclpC strains in parallel. For comparison, pArg proteins identified in 

the B. subtilis wild-type pull-downs are marked in orange in the AclpC plot. 


wild-type and clpC knockout (AclpC) cells in parallel. Whereas we 
observed 14 pArg proteins in the pull-downs performed in wild-type 
cells, the pull-downs in AclpC cells revealed only a single pArg sub- 
strate (Fig. 1c, Extended Data Table 1). Of note, this substrate, CtsR, was 
also shown to be targeted to ClpP by the ClpX and CIpE unfoldases”””?, 
The almost complete absence of ClpP-trapped pArg proteins in AclpC 
bacteria is even more remarkable, as the deletion of ClpC increases 
the overall amounts of pArg proteins. Despite the presence of YwlE, a 
highly active arginine phosphatase preventing pArg identification in 
B. subtilis wild-type cells'>!?4, phosphoproteomics analysis of AclpC 
cell lysates revealed 25 pArg sites (Supplementary Table 3). This finding 
highlights the active role of ClpC in directing pArg proteins to ClpP- 
dependent proteolysis. Consistent with the proposed model, AclpP 
B. subtilis cell extracts also accumulated pArg proteins (Supplementary 
Table 4). To estimate the fraction of pArg proteins among the ClpP 
substrates, we analysed the overlap of the ClpP degradome (as defined 
by our pull-down experiments) and the pArg proteome (sites detected 
previously'*!*?° and in the AclpP mutant strain). Accordingly, 25% of 
the proteins degraded by ClpP are substrates of McsB and thus poten- 
tial candidates of the pArg-dependent degradation pathway (see also 
Supplementary Discussion and Extended Data Fig. 2c). 


Protein phosphorylation stimulates ClpCP 

To analyse how ClpC selects pArg-containing substrates, we reconsti- 
tuted the ClpCP-McsB system in vitro. McsB was previously described 
as an adaptor of ClpC targeting the transcriptional repressor CtsR for 
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Figure 2 | Effect of McsB on the activity of ClpCP in vitro. a, ClpCP- 
mediated degradation of (-casein in the presence of different effector 
proteins. Here and in the following, a quantification of the B-casein 

band is presented (original SDS-PAGE gels in Supplementary Fig. 2). 
Bottom panel shows the kinase activity of assayed McsB variants in an 
autoradiography plot. AU, arbitrary units. b, In contrast to active McsB, the 
inactive McsB**!*4 kinase cannot stimulate casein degradation by ClpCP. 
c, Effect of the YwlE arginine phosphatase on ClpCP activation by McsBA. 
The inactive YwlE!!8% mutant was used as a negative control. 


ClpCP-mediated proteolysis’”'®. Although the kinase activity of McsB 
was shown to be required for CtsR degradation, it was not clear whether 
McsB itself, the substrate or the ClpCP protease became phosphoryl- 
ated, and how this phosphorylation event enhanced protease activity. 
We thus recapitulated the corresponding kinase and protease assays 
using the intrinsically unfolded protein B-casein as a model substrate. 
We observed that the ClpCP protease complex alone was not active. 
However, in the presence of MecA, a well-characterized ClpC adaptor”, 
the substrate was efficiently hydrolysed (Fig. 2a). Similarly, McsB 
induced the degradation of 8-casein by ClpCP. The stimulatory effect 
was enhanced by the MscB activator McsA?’, which showed no effect 
on casein degradation by itself (Fig. 2a). To test whether the kinase 
activity of McsB is required for 3-casein degradation, we used an inac- 
tive mutant of McsB (Glu212Ala; ref. 13), and, in parallel, probed the 
effect of the YwlE arginine phosphatase. Both kinase inactivation and 
phosphatase addition prevented substrate degradation (Fig. 2b, c), 
highlighting the importance of the arginine kinase activity of McsB for 
activating the ClpCP protease. This functional coupling is also reflected 
in the different kinase activities of the tested McsB variants (Fig. 2a). 


pArg is a degradation tag for ClpCP 

To explore the stimulatory effect of McsB further, we performed 
degradation assays in the presence of the free amino acid phosphoargi- 
nine (pArg4, in which ‘AA’ denotes the amino acid). We reasoned 
that pArg““ may compete with, and thus reveal, the pArg-dependent 
activation event. When incubated with McsB and ClpCP, pArg““ 
reduced the rate of }-casein degradation (Fig. 3a). We next explored 
the influence of pArg““ on B-casein degradation by the ClpCP-MecA 
complex that should operate in a phosphorylation-independent man- 
ner. Unexpectedly, however, pArg““ also inhibited (and to a greater 
extent) the activity of the MecA-stimulated ClpCP protease (Fig. 3b). 
By contrast, unphosphorylated arginine or phosphate did not block 
degradation. As it is known that effector proteins such as MecA dock 
to the NTD of AAA unfoldases, we monitored how pArg*“ influences 
this interaction (Fig. 3c). A pull-down experiment using the NTD of 
ClpC (ClpCN?) revealed that pArg™ inhibits the association between 
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Figure 3 | Binding of pArg““ to ClpC. a, b, Inhibitory effect of pArg““ on 
McsBA-activated (a) and MecA-activated (b) ClpCP. AU, arbitrary units. 
c, Pull-down experiment monitoring the interaction of the NTD (wild 
type and E32A/E106A mutant) with MecA (101M each) in the presence 
pArg“. d, ITC profile of pArg““ binding to full-length double Walker 

B mutant ClpC (ClpC?W8"; left) and the NTD of ClpC (ClpCN!P0-150), 
right). Determined Kj, values are indicated. 


MecA and ClpC, probably by competing with MecA for the same bind- 
ing site. When probing the direct interaction between pArg™ and ClpC 
by isothermal titration calorimetry (ITC), we measured dissociation 
constants (Kg values) of 60|1M for full-length ClpC and 13 1M for the 
isolated NTD (Fig. 3d). The pronounced specificity for pArg““ was 
confirmed by measuring the interaction with related compounds, 
which did not bind (pTyr™, arginine) or did so only weakly (phos- 
phate). Moreover, we observed that pArg““ did not bind to MecA, and 
has a very low affinity (Kg > 1mM) for McsB (Extended Data Fig. 3). 
Finally, we tested the binding of pArg™ to the NTD of ClpA, the closest 
homologue of ClpC in Gram-negative bacteria, which lack a protein 
arginine kinase. Because no binding was observed (Extended Data 
Fig. 3), the ability to recognise pArg” with high specificity seems to 
be a unique property of the ClpC unfoldase. 

The pArg-binding site of ClpC could have two possible functions 
in protein degradation. It could serve as a docking site for the auto- 
phosphorylated form of McsB, which functions as an adaptor, or, 
alternatively, it could directly recognize pArg-containing substrates. To 
distinguish between these two possibilities, we enzymatically prepared 
pArg-modified 6-casein (caseinP4'8) (Fig. 4a and Extended Data Fig. 4). 
A pull-down assay showed that casein?“‘8, but not unphosphoryl- 
ated casein, binds to the ClpCN™ (Fig. 4b). Because inhibition of the 
casein?“'8-NTD association required an excess of pArg™, it seems that 
a pArg residue in a protein context interacts more strongly with the 
NTD than the free phospho amino acid. When testing casein?“"8 as 
a direct ClpCP substrate, we observed that the ClpCP protease could 
degrade casein?“"S even in the absence of McsB or MecA (Fig. 4c). 
These data suggest that the assembly of the functional ClpCP protease 
can proceed without adaptor proteins. To corroborate this surprising 
finding, we performed ClpC and ClpP pull-down experiments after 
incubation with substrate. Consistent with the degradation assays 
results, we observed the formation of a transient ClpCP complex in 
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Figure 4 | ClpCP protease activity towards a pArg-containing substrate 
protein. a, Preparation of the casein?“'8 model substrate. SEC, size 
exclusion chromatography. b, Binding of casein?“"8 (35 j1M) to ClpCN?? 
(101M) at increasing amounts of pArg*“. c, Degradation of casein?“'8 by 
ClpCP without adaptor proteins and the inhibitory effect of pArg“ on this 
activity. d, Pull-down experiment monitoring ClpCP complex formation 
in the presence of MecA, casein and caseinP“'S, e, Preparation of substrate 
samples that contain increasing amounts of casein?“"8 after prolonged 
incubation with McsBA. ClpCP degradation of resultant casein?A"S samples 
is directly correlated to the degree of substrate phosphorylation seen in 
pArg immunoblots (Extended Data Fig. 3e). 


the presence of casein?4"8, even to a higher degree than in the presence 
of MecA. Conversely, unphosphorylated casein could not promote 
ClpCP assembly (Fig. 4d). Together, these data show that the pArg 
modification is crucial for recruiting substrates to the NTD of ClpC 
and for promoting assembly of the functional ClpCP protease complex. 

To confirm the role of pArg as a degradation signal for the ClpCP 
protease, we analysed the digestion of substrate proteins that were arginine- 
phosphorylated to different degrees. For this purpose, we pre-incubated 
6-casein with McsB for increasing time intervals (Fig. 4e). ClpCP deg- 
radation assays of the resulting casein/casein?“"8 mixtures showed a 
direct correlation between the amount of phosphorylated substrate 
and the extent of degradation (Fig. 4e and Extended Data Fig. 4e). 
Consistently, adding YwlE phosphatase to the pArg-modified substrates 
abolished their degradation. These results unambiguously demonstrate 
that ClpCP selectively degrades casein?“'8 and does not recognize 
pArg-less proteins as substrates. 


The pArg docking sites of ClpC 

To visualize how pArg binds to ClpC, we performed co-crystallization 
experiments of the NTD from B. subtilis ClpC with pArg““. The 
co-crystal structure was determined at 1.6A resolution (Extended 
Data Table 2) and, consistent with the symmetrical nature of the NID 
protein fold’® (Fig. 5a), revealed two almost identical pArg-binding 
sites (Fig. 5b, c). Mapping the electrostatic potential of the NTD 
on its molecular surface illustrates the ‘bipolar’ architecture of the 
pArg-binding site that distinguishes it from pSer/Thr or pTyr binding 
sites (Extended Data Fig. 5). Such organization is perfectly suited for 
simultaneously recognizing the positively charged guanidinium and 
the negatively charged phosphoryl group. The functional importance 
of the pArg recruitment is reflected by the exact conservation of the 
interacting residues in ClpC proteins from other Gram-positive species 
(Extended Data Fig. 6). Since the structural data suggest that the ClpC 
hexamer has 12 pArg-docking sites, we asked how many pArg tags 
per substrate are required for degradation. Native MS analysis of the 
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Figure 5 | Crystal structure of ClpCN! in complex with pArg. a, Overall 


structure of the ClpCN?? domain bound to two pArg*4 molecules. The 
cartoon representation is coloured in light grey (residues 70-148) and 
dark grey (4-69) to highlight the two symmetrical halves of the NTD. 

b, c, Zoomed view of pArg-binding sites 1 (pArg1; b) and 2 (pArg2; ¢), 
with labelled interacting residues. The 2F, — F, omit electron densities of 
the pArg*4 ligands, calculated at 1.6 A resolution, are contoured at 1c. 

d, Overlap of MecA- and pArg-binding sites. Shown is the hexameric 
organization of the ClpC!~**° (grey) complex with MecA”*!~?!8 (blue) 
(PDB code 3PXG; ref. 29) superimposed with pArg™ (orange). Bottom 
left, zoomed-in view shows two adjacent ClpCN"? domains with pArg** 


caseinP“'8 sample that resulted from prolonged incubation with McsB 
and that was completely degraded after ClpCP incubation revealed a 
mixture of mono- and di-phosphorylated molecules (Extended Data 
Fig. 7). Given the efficiency of also cleaving less phosphorylated sub- 
strates (Fig. 4f), we suppose that proteins carrying a single pArg mark 
can be degraded by ClpCP. 

Notably, the identified pArg-binding sites match the MecA-binding 
grooves observed in the MecA-ClpC complex”’, with the phosphoryl 
moieties of pArg*“ binding in place of the MecA glutamate residues 184 
and 198 (Fig. 5d). This overlap explains the inhibition of MecA-ClpCP 
by pArg*4, as observed in our functional studies. Importantly, pArg“4 
binds to ClpC glutamate residues 32 and 106, which in the MecA-ClpC 
complex remain unbound. Therefore, the bipolar architecture of the 
pArg-binding sites can only be fully explored by the phosphoguan- 
idinium moiety, whereas the MecA glutamate residues only dock 
into the positively charged half. To test the structurally characterized 
binding mode experimentally, we prepared NTD mutants carrying an 
Glu32Ala/Glu106Ala double mutation (EA) and measured the interac- 
tion with MecA and pArg™. As predicted, mutating the two glutamate 
residues abolished pArg™ binding (Extended Data Fig. 3) but did not 
impair the interaction with MecA (Fig. 3c). Consistent with the binding 
data, the corresponding ClpCP** protease efficiently degraded protein 
substrates with the help of MecA, but failed to degrade substrates in a 
pArg-dependent manner (Fig. 5e, f). To confirm the selective failure 
in recognizing and degrading pArg protein, we measured the ClpC 
ATPase activity, which, in analogy to other AAA unfoldases*", should 
be stimulated by substrates. We observed that purified casein?“ could 
stimulate the ATPase activity of wild-type ClpC to a similar extent to 
MecA (Fig. 5g). In strong contrast, the ATPase of the ClpC®“ mutant 
could be induced by MecA, but not by casein?“’8, confirming the selec- 
tive loss of pArg-dependent functions in this mutant. 


Biological role of the pArg-ClpCP system 

The developed ClpC™4 mutant represents a valuable tool for address- 
ing the biological role of pArg-dependent protein degradation. To this 
end, we analysed the ability of the ClpC™ mutant to suppress growth 
defects of a AclpC strain at increased temperatures (Fig. 6a). Whereas 


and MecA interactors. Bottom right, zoomed-in view illustrates that the 
pArg phosphoryl group and Glu184 (Glu198 in second binding site) of 
MecA compete for the same ClpC binding pocket. e, Scheme representing 
the distinct substrate (S) preferences of the wild-type ClpCP and ClpCP™ 
protease complexes. f, Degradation assays comparing the activity of wild-type 
ClpCP and ClpCP£4 towards MecA-delivered (left) and pArg-labelled 
(middle, right) casein. YwlE was used as a control for the pArg-dependent 
degradation. g, ATPase activity of ClpC and ClpC* in the presence of 
putative substrate proteins. Levels are normalized to the induced ATPase 
activity of the ClpC-casein?“'8 complex. Error bars show the s.d. of three 
independent experiments. 


the expression of wild-type ClpC from a plasmid restored thermotol- 
erance and even made the bacteria more robust in surviving increased 
temperatures, expression of the ClpCP£“ mutant could not rescue the 
bacteria. As the ClpCP™4 mutant is fully functional as a protease and 
can team up with adaptor proteins, these data highlight the essential 
role of the pArg-dependent degradation pathway in surviving prote- 
otoxic stress situations. 


Discussion 

Energy-dependent proteases are essential for all living organisms to 
carry out protein quality control and degrade short-lived regulatory 
proteins. In contrast to eukaryotes, which universally use polyubiq- 
uitin chains for marking target proteins, a general post-translational 
modification regulating proteolysis in bacteria is not known. Here, we 
characterize such a tagging system. We show that pArg is a degrada- 
tion mark for the ClpCP proteolytic machine, present in most Gram- 
positive species. Despite differing in size, the bacterial pArg modifier 
shares several features with the eukaryotic polyubiquitin degradation 
tag. First, both pArg and polyubiquitin are post-translationally attached 
to substrates, allowing for dynamic regulation of degradation that is 
not available to mechanisms relying on sequence-encoded degrons. 
Second, the pArg mark is recognized by highly specific receptor sites 
on the NTD of ClpC (Fig. 6b), as is ubiquitin by special receptor 
proteins of the 19S regulatory particle. Third, owing to charge inver- 
sion, the phosphorylation of arginine residues is predicted to destabilize 
the native structure of substrate proteins, priming them for subsequent 
catalysed unfolding. Similarly, the polyubiquitin tag affects the struc- 
ture and stability of marked proteins*!. Fourth, the pArg tag is revers- 
ibly attached to substrate proteins. As enzymes building polyubiqutin 
chains are opposed by de-ubiquitinases, the activity of the McsB kinase 
is counteracted by the pArg-specific phosphatase YwIE, thus allowing 
for regulation of the pArg degradation pathway. 

In addition to revealing the pArg degradation tag, our study clarifies 
the mechanism of the ClpCP protease. Notably, ClpC has been reported 
to be a unique AAA enzyme that requires accessory proteins to assem- 
ble its functional hexameric form*. The present data suggest that this 
model is not fully correct, as the degradation of pArg-containing 
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Figure 6 | The pArg-ClpCP degradation system. a, Thermotolerance 
assay analysing the in vivo complementation of AclpC by expressing ClpC 
and ClpC 4. Levels are normalized to the values before heat shock (time 0), 
and numbers above bars represent the fraction of cells surviving after 2h 
heat shock. CFU, colony-forming units. Error bars show the s.d. of three 
independent experiments. b, The pArg-ClpCP system. Left, cartoon 
representation shows that after phosphorylation by the McsB arginine 
kinase, pArg-tagged proteins are targeted to the ClpCP protease. Binding 
of pArg proteins to one of the 12 NTD binding pockets stimulates the 
ATPase activity of ClpC, leading to the translocation of the captured 
substrate into the ClpP protease cage and to protein degradation. Right, 

a model of the respective ClpCP complex (ClpCN?” in light grey, 

ClpC AAA1/2 in white, ClpP in dark grey, substrate in black). 


substrates does not require any additional co-factors. Substrate recruit- 
ment itself induces the ATPase activity of ClpC and promotes assembly 
of the functional ClpCP complex. Similar to the eukaryotic 26S protea- 
some, binding of specifically marked substrates is thus directly linked 
to protease activation. On the basis of the described functional similar- 
ities, the discovered pArg-ClpCP system seems to represent a simple 
bacterial version of the eukaryotic ubiquitin-proteasome system. 

Our study also provides important insights into the biological role of 
pArg as a degradation tag. Analysis of the in vivo ClpP degradome sug- 
gests that the pArg tag is crucial not only for the regulatory proteolysis 
of CtsR but also for general turnover of structurally and functionally 
diverse proteins. Furthermore, we observed that pArg-dependent pro- 
tein degradation is vital for coping with proteotoxic stress (Fig. 6a), and 
that pArg proteins are markedly enriched in the aggregate fraction of 
B. subtilis cells (supplementary Tables 4 (AclpP) and 5 (wild-type)). Of 
note, 50% of the in vivo phosphorylated proteins!*'® carry the phos- 
phomark in a region predicted to adopt a defined secondary structure, 
that is, in an a-helix or 6-strand. Presumably, these sites would only be 
accessible to McsB when present in an at least partially unfolded state, 
indicating that McsB might target the damaged form of those proteins. 
Consistent with this, a recent in vivo study points to the importance 
of the McsB kinase for removing aberrant proteins in B. subtilis: the 
deletion of the kinase led to the accumulation and aggregation of an 
unstable model protein, while the levels of a stably folded counter- 
part of this model protein were not influenced**. We thus presume 
that protein arginine phosphorylation may have a role in the quality 
control of bacterial proteins, targeting unstable and aggregation-prone 
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proteins for ClpCP degradation. Modifying arginine of all amino acids 
to decide about the fate of aberrant proteins seems to make sense. 
As arginine-rich patches correlate with aggregation propensity, adding 
a phosphoryl group to arginine residues could hinder aggregation and, 
at the same time, promote the clearance of such problematic protein 
species by co-working protease machines. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


DNA construct design. For recombinant protein overexpression in Escherichia coli, 
the clpP, mcsA, mcsB, clpC and clpCch}? (NTD) genes/fragments from B. subtilis 
and clpA!!° (NTD) from E. coli were cloned into pET21 or pET SUMO vectors 
(Novagen) conferring a terminal hexahistidine (6His) tag. Previously published 
pET21-derived plasmids were used for the expression of Geobacillus stearother- 
mophilus McsB, YwlE and YwlEP!!8N proteins!?*°. For protein expression in 
B. subtilis, genes were cloned into the vector pPHCMC05 (ref. 35) that contains the 
Psac IPTG-inducible promoter. For the expression of the ClpP, ClpP™“?(S98A), 
ClpP*(E119A/R142E), ClpP*T®4P(S98A/E119A/R142E), ClpC and ClpC*4(E32A/ 
E106A), a C-terminal 6His tag including a Leu-Glu linker was introduced by PCR 
amplification. Single point mutations were generated using the QuikChange II 
mutagenesis kit (Agilent Technologies). 
B. subtilis strains. The pHCMC05 plasmids described above were transformed 
into wild-type B. subtilis (strain 168) (ATCC 2385). To generate ClpC-knockout 
(AclpC) bacteria, genomic DNA from a B. subtilis AclpC::tet strain*® was trans- 
formed into the B. subtilis strains containing the PHCMC0S5, pHCMC05-clpP*, 
PHCMC05-clpP* TR4?, pHCMC05-ClpC or pHCMC05-ClpC-E32A-E106A 
plasmids. The disruption of clpC in the resulting strains was confirmed by sequencing. 
A previously published B. subtilis AclpP::spec® (ref. 37) strain was used to grow 
ClpP-knockout bacteria for phosphoproteomic analysis. 
Culture procedures. All bacterial cultures were grown in Luria-Bertani (LB) 
medium. For the B. subtilis AclpC and AclpP strains, tetracycline (101g ml!) 
and spectinomycin (100,1g ml!) were added, respectively. E. coli and B. subtilis 
cultures containing the pHCMC05-derived plasmids were cultured in the presence 
of ampicilin (501g ml~!) and chloramphenicol (101g ml“), respectively. E. coli 
BL21 (DE3) containing pET21- or pET SUMO-derived vectors were cultured in 
the presence of ampicilin (50,.g ml“). 
In vivo pull-downs using ClpP trapping mutants. For the ClpP*™” pull-down 
experiment in the wild-type background of B. subtilis (Fig. 1b), 6 independent 
B. subtilis cultures were grown in LB media expressing either ClpP* (3 control 
cultures to identify unspecific binding partners) or ClpP*™*” (3 sample cultures). 
After the cells were grown at 37°C until mid-exponential phase, expression of 
His-tagged ClpP* or ClpP*T®4? proteins was induced with 1 mM IPTG. 
Recombinant protein expression proceeded for 3h at 37 °C. To induce the activity 
of heat-shock proteins, including McsB and various Clp ATPases, the cultures were 
incubated in a pre-warmed incubator at 45°C for 45 min. Cells were collected by 
centrifugation, resuspended in lysis buffer (25 mM Tris, pH 7.5, 150 mM NaCl, 
10% glycerol) and stored at —80°C. For the pull-down experiment comparing 
ClpP-trapped proteins in wild-type and AclpC backgrounds (Fig. 1c), the same 
procedure was applied using 12 independent B. subtilis cultures (3 wild type with 
ClpP*, 3 wild type with ClpP*T®4?, 3 AclpC with ClpP*, 3 AclpC with ClpP*T™*). 
The thawed cell suspensions were incubated for 1h on ice with 2mg ml“! 
lysozyme (Sigma), Complete protease inhibitor cocktail (Roche), 0.2 mM PMSF 
(Sigma) and 101g ml“! DNase (Sigma). Cells were sonicated and the resultant 
lysate was cleared by centrifugation at 4°C. For the purification of 6His-tagged 
ClpP, the lysate was incubated with Dynabeads His-Tag Isolation & Pulldown 
(Invitrogen) for 1h at 4 °C. The beads were then washed 5 x with lysis buffer and 
2x with lysis buffer containing 50 mM imidazole. Two aliquots (5 or 10%) of the 
resulting beads were collected for SDS-PAGE analysis (protein elution with dena- 
turing SDS-PAGE sample buffer) or Tris-acetate native-PAGE (protein elution with 
lysis buffer containing 500 mM imidazole). The remaining beads were subjected to 
reduction with 2mM DTT (56°C, 40 min), alkylation with 10 mM iodoacetamide 
(room temperature, in the dark, 45 min), and digestion with 2.5 1g Trypsin Gold 
(Promega) at 37°C for 12h. 
MALDI-MS analysis of ClpP purifications. To analyse the mass of B. subtilis 
proteins co-purifying with ClpP(6His) under heat-shock conditions (Extended 
Data Fig. 1), matrix assisted laser desorption ionization time-of-flight mass 
spectrometry (MALDI-TOF-MS) was performed. The corresponding protein 
purifications were spotted on a MALDI plate using a sinapinic acid (10 mg ml!) 
matrix prepared in 50% acetonitrile (ACN) and 0.1% trifluoroacetic acid (TFA). 
The samples were measured in a 4800 MALDI-TOF-TOF (AB Sciex) instrument 
operated in linear mode. Calibration was performed internally using cytochrome 
cas standard. 
Sample preparation for phosphoproteomic analysis. For the phosphoproteomic 
analysis of total cell extracts of B. subtilis AclpC, cells were lysed by sonication 
in buffer 4% SDS, 100 mM Tris, pH 7.5, 100 mM DTT and further processed 
using a filter aided sample preparation (FASP)** modified method, as described 
previously’°, followed by trypsin digestion at 37°C for 12-16h. Protein aggregates 
were dissolved in the SDS buffer and processed in the same manner. Trypsin digestion 
completion was inspected by retention time and UV intensity (214 nm) distribution 
upon reverse-phase high-performance liquid chromatography (RP-HPLC) sep- 
aration of a 0.1% aliquot of the resulting supernatants on a monolithic column 


(Ultimate Plus equipped with a PepSwift PS-DVB column, 5 cm x 200j1m, Dionex- 
Thermo-Fisher). 

For the ClpP in vivo pull-down assays, a small aliquot (0.5%) of the on-bead 

trypsin digests was collected for subsequent quantitative analyses of co-purified 
proteins. The biological replicates were then pooled and further processed for 
phosphorylation analysis. Before phosphopeptide enrichment, sample digests 
were purified from buffer reagents by RP-C18 solid phase extraction at neutral 
pH using Oasis HLB cartridges (Waters). A previously described TiO, protocol’, 
optimized in accordance to the acid-labile nature of phosphoarginine, was used 
for phosphopeptide enrichment. 
LC-MS/MS analysis. Reverse-phase separation of all peptide mixtures was carried 
out on an Ultimate 3000 RSLC nano-flow chromatography system (Thermo 
Scientific), using 0.5% acetic acid (pH 4.5 with NH3) as loading solvent, to prevent 
phosphoarginine hydrolysis during removal of salts in the pre-column 
(PepMapAcclaim C18, 5mm x 0.3mm, 5j.m, Thermo Scientific). Peptide 
separation was achieved on a C18 separation column (PepMapAcclaim C18, 
50cm x 0.75 mm, 2\1m, Thermo Scientific) by applying a linear gradient from 2% 
to 35% solvent B (80% ACN, 0.08% formic acid) in 120 or 240 min (pull-down and 
total extract samples, respectively) at a flow rate of 230 nl min~'. Solvent A was 
2% ACN, 0.1% formic acid. The separation was monitored by UV detection and 
the outlet of the detector was directly coupled to the nano-electrospray ionization 
source (Proxeon Biosystems) for MS analysis. 

For phosphorylation analysis, TiO. elution samples were infused into the LTQ 
Orbitrap Velos Pro ETD mass spectrometer (Thermo Scientific) using PicoTip 
nanospray emitter tips (New Objective) at a voltage of 1.5 kV. Peptides were ana- 
lysed in data-dependent fashion in positive ionization mode, applying two different 
fragmentation methods: collision-induced dissociation (CID) and electron-trans- 
fer dissociation (ETD). The survey scan was acquired at resolution 60,000 and the 
6 most abundant signals with charge state equal or higher than 2+ and exceeding 
an intensity threshold of 1,500 counts were selected for peptide fragmentation 
analysis. For MS/MS experiments, precursor ions were isolated within a 2.1-Da 
window centred on the observed m/z. To prevent repeated fragmentation of highly 
abundant peptides, selected precursors were dynamically excluded for 30s from 
MS/MS analysis. CID fragmentation was achieved at normalized collision energy 
(NCE) of 35% with additional activation of the neutral loss precursor at M-49, 
M-32.7 and M-98 amu ina standard multistage activation method. For ETD, pep- 
tides were incubated with fluoranthene anions allowing for charge-state-dependent 
incubation times (90 ms for 3+ charged peptides), and resulting peptide fragments 
were detected in the ion trap analyser. 

For the identification of co-purified proteins in the ClpP in vivo pull-down 
assays, slightly different instrument settings were used. The 12 most abundant 
signals with charge state equal to or higher than 2+ and exceeding an intensity 
threshold of 500 counts were selected for CID peptide fragmentation analysis, 
applying an isolation window of 2 Da. Multistage activation was disabled. Selected 
precursors were dynamically excluded for 60s from MS/MS analysis. Each pull- 
down digest was analysed twice, to evaluate technical reproducibility. 

MS data analysis. For the phosphorylation analysis of TiO2-enrchiment sam- 
ples, raw data were extracted by the Proteome Discoverer software suite (version 
1.4.0.288, Thermo Scientific) and searched against a combined forward/reversed 
database of B. subtilis Uniprot Reference Proteome with common contaminants 
added (4,455 entries in total) using MASCOT (version 2.2.07, Matrix Science). 
Carbamidomethylation of cysteine was set as fixed modification. Phosphorylation 
of serine, threonine, tyrosine and arginine plus oxidation of methionine were 
selected as variable modifications. Since tryptic cleavage is impaired at phospho- 
rylated arginine, a maximum of two missed cleavage sites was allowed, whereas 
fully tryptic cleavage of both termini was required. The peptide mass deviation 
was set to 5 p.p.m.; fragment ions were allowed to have a mass deviation of 0.8 Da. 
False discovery rates were assessed using the Percolator tool? within the Proteome 
Discoverer package. The results were filtered for peptide rank 1 and high identifica- 
tion confidence, corresponding to a 1% false discovery rate. Low-scoring peptides 
(Mascot ion score < 20) were manually verified. In the rare cases in which a peptide 
was mapped to more than one protein sequence, both protein hits are reported. For 
reliable phosphorylation site analysis, all phosphopeptide hits were automatically 
re-analysed by the phosphoRS software’ within the Proteome Discoverer software 
suite. All the phosphopeptides identified in the ClpP in vivo pull-down assays 
were manually inspected. For other samples, we considered a phosphorylation site 
to be localized when the reported phosphoRS probability was higher than 90%. 
When multiple peptide-spectrum matches (PSMs) were obtained for the same 
phosphopeptide, only the PSM presenting the best identification/localization score 
compromise is presented. The multiple redundant PSMs were ranked according 
to their phosphoRS probability score into three categories (90-94%, 94-97% and 
97-100%); the PSM presenting the best Mascot score within the highest phos- 
phoRS category achieved was reported. PSMs presenting wrong or inconclusive 
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localizations were thus excluded from the final list of phosphopeptides. Multiply 
phosphorylated peptides were also excluded from the analysis, because they cannot 
be classified into ‘phosphorylation type’ categories. 

For quantitative analysis of ClpP-trapped proteins, MS data were analysed using 
the MaxQuant software environment"’, version 1.5.1.2, and its built-in Andromeda 
search engine”, against the B. subtilis Uniprot database described above. Strict 
trypsin specificity with up to two missed cleavages was used. The minimum 
required peptide length was set to six amino acids. Carbamidomethylation of 
cysteine was set as a fixed modification and N-acetylation of proteins N termini 
(42.010565 Da) and oxidation of methionine were set as variable modifications. 
During the main search, parent masses were allowed an initial mass deviation of 
4.5 p.p.m. and fragment ions were allowed a mass deviation of 0.5 Da. The mass 
accuracy of the precursor ions was improved by time-dependent recalibration 
algorithms of MaxQuant. The ‘match between runs option was enabled to match 
identifications across samples within a time window of 2 min of the aligned reten- 
tion times. The second peptide identification option in Andromeda was enabled. 
PSM and protein identifications were filtered using a target-decoy approach at 
false discovery rate of 1% for PSMs and 5% for proteins. Relative, label-free quan- 
tification of proteins was done using the MaxLFQ algorithm’ integrated into 
MaxQuant using default parameters. Unique and razor peptides were considered 
for quantification. 

Statistical evaluation of the resulting protein quantifications was performed 

using R scripting. Proteins quantified in less than 50% of the samples were filtered 
out. Missing LFQ values were substituted by the lowest value observed in the cor- 
responding sample. For each protein, the fold change of LFQ-averaged intensities 
(ratio ClpP™®?/control) and the corresponding P value (Limma test; Linear 
Models for Microarray Data) were calculated. A protein was considered to bea 
ClpP substrate when it was found to be enriched in the TRAP pull-down assays by 
a factor of at least 2 and P< 0.05. The ‘protein groups output file from MaxQuant 
containing the statistical evaluation is available in Supplementary Table 1. The mass 
spectrometry data have been deposited to the ProteomeXchange Consortium via 
the PRIDE partner repository“ (http://proteomecentral.proteomexchange.org). 
Representative spectra of the pArg peptides identified in the ClpP trapping mutant 
pull-downs are presented in Supplementary Fig. 1. 
Expression and purification of recombinant proteins. For the overexpression of 
recombinant proteins in E. coli BL21 (DE3), LB cultures were grown at 37 °C until 
the exponential phase, when expression was induced with 0.5mM IPTG. After 
expression, cells were collected by centrifugation, resuspended in buffer A and 
stored at —80°C. As the B. subtilis ClpC protein was unstable when expressed in 
E. coli, the production of wild-type ClpC(6His) and ClpC®(6His) was performed 
in B. subtilis containing the corresponding pHCMO05 plasmids, by induction with 
1mM IPTG for 3h at 37°C. The optimal expression strategies and purification 
buffers are summarized in Extended Data Table 3. 

Cell suspensions were incubated on ice for 30 min in the presence of 1 mg ml! 
lysozyme, 0.1 mM PMSF, 10j.g ml“! DNase and sonicated. Lysates were cleared 
by centrifugation and loaded on a 5 ml Ni- or Co-NTA column (GE Healthcare 
LifeSciences) equilibrated in buffer A. Washes were performed using a step- 
wise imidazole gradient, typically starting with 25 mM. The His-tagged proteins 
were eluted with buffer A containing 250 mM imidazole and concentrated using 
Vivaspin devices (Sartorius Stedim Biotech). Constructs expressed as a SUMO- 
fusion (SUMO-ClpP and SUMO-MecA) were incubated with SUMO Protease 
(Thermo Fisher Scientific) to obtain tag-free versions of the proteins. All resulting 
proteins were further purified by gel filtration on a Superdex-75 or -200 column 
(GE Healthcare LifeSciences) equilibrated with buffer B. For the purification of 
McsA(6His), the full-length protein was separated from an abundant cleavage 
product by ion exchange on a MonoQ column (GE Healthcare LifeSciences) using 
a 0.1-1M NaCl gradient in 50 mM Tris, pH 8.5, 1mM TCEP. All proteins were 
aliquoted and stored at —80°C until further use. 

For the purification of the B. subtilis ClpCN™ and NTD*4 mutant, affinity, ion 
exchange, and size exclusion chromatography were carried out in a 0.5 x PBS buffer 
(6mM Na/K phosphate, pH 7.25, 1.35 mM KCl, 68.5mM NaC]). After elution from 
Ni-NTA using the PBS buffer supplemented with 250 mM imidazole, the protein 
was passed through a ResourceQ column (GE Healthcare LifeSciences). This was 
followed by gel filtration on a Superdex 200 column (GE Healthcare LifeSciences). 
In vitro pull-down assays. Varying amounts of the analysed components (indi- 
vidual proteins, pArg*) were incubated in 200 ul of reaction buffer (10 mM Tris, 
pH 8.0, 50mM NaCl, 5mM MgCl, and 25 mM imidazole). After 5 min, 5011 of 
Ni-Sepharose was added to capture His-tagged proteins. Sepharose was washed 
three times with 2001] reaction buffer and bound proteins were eluted with 50 ul 
of elution buffer (10 mM Tris, pH 8.0, 50mM NaCl, 5mM MgCl and 1M imida- 
zole). For visualization, input and elution, fractions were analysed by SDS-PAGE. 
Native-PAGE. For the analysis of the oligomeric state of purified ClpP sam- 
ples, a Tris-acetate non-denaturing PAGE system was used. Gel composition 


ARTICLE 


corresponded to 7, 10 or 15% acrylamide, 0.24% bis-acrylamide, 200 mM Tris- 
acetate, pH 7, polymerized in the presence of 0.042% ammonium persulfate 
(APS) and 0.125% N,N,N’,N’-tetramethylethylenediamine (TEMED). The run- 
ning buffer composition was 25 mM Tris-HCl, 192 mM glycine, pH 8.3. Protein 
separation was performed at 4°C and 150 mV for 3-4 h, and proteins were visual- 
ized by Coomassie-based InstantBlue protein stain (Expedeon). The NativeMark 
unstained protein standard (Thermo Fisher Scientific) was used for the estimation 
of ClpP oligomeric state. 

Purification of casein?“'8, To produce the arginine-phosphorylated substrate, 
B-casein from bovine milk (Sigma) was incubated at 10,1M (all protein concen- 
trations, if not otherwise mentioned, are for a single protomer) with 2 1M McsB 
and McsA from B. subtilis in 25 mM Tris, pH 7.5, 50mM NaCl, 20mM MgCh, and 
5mM ATP at 30°C for 2h. The reaction mixture was concentrated with a Vivaspin 
device and applied to a Superdex-200 size exclusion column equilibrated with 
25mM Tris, pH 7.5, 50mM NaCl. The fractions most strongly enriched in 8-casein 
over McsB (Extended Data Fig. 5a) were then pooled, concentrated and stored at 
—80°C. The presence of the pArg modification was confirmed by immunoblotting 
using a pArg-specific antibody as described below. 

To obtain casein?“'8 of higher purity, a similar phosphorylation reaction was 
performed, except that 41M of the G. stearothermophilus McsB(6His) was used 
instead of the B. subtilis McsBA complex. The reaction mixture was afterwards 
applied to a Ni-NTA column (GE Healthcare LifeSciences) to reduce the amounts 
of the His-tagged McsB before the final gel filtration purification step (Extended 
Data Fig. 5c). 

To prepare 3-casein phosphorylated to different degrees (Fig. 4e), a large-scale 
phosphorylation reaction was set-up with the B. subtilis McsBA at 30°C. An aliquot 
(time 0) was collected before the addition of ATP. After adding ATP, aliquots were 
taken after 10, 30, 60 and 120 min. After adding 100 mM EDTA (to stop phospho- 
rylation), each aliquot was concentrated by Vivaspin ultrafiltration and applied to 
a Superdex 200 size exclusion column. An additional sample, which was collected 
after 120 min incubation with McsBA, was treated with 11M YwIE arginine phos- 
phatase for 2h. The phosphatase was then inactivated by adding 2mM pervanadate 
and the sample was concentrated and submitted to size exclusion chromatography. 
The different casein?4"S preparations were concentrated and stored at —80°C. 

To quantify the increase of arginine phosphorylation over time, a western blot 
was performed using the pArg-specific antibody described previously”*. The 
caseinP“'8 preparations (2 |g) were separated by SDS-PAGE, transferred to a 
nitrocellulose membrane and fixed using a 0.4% formaldehyde solution in PBS, 
pH 7.5, for 30 min as previously described“. After blocking, the pArg-specific 
primary antibody (21g ml~!, Morphosys AG) was incubated overnight at 4°C 
and the secondary antibody (goat anti human IgG F(ab’)2:HRP, AbD Serotec) 
was used at a 1:7,000 dilution for 1.5h at room temperature. The detection was 
performed using ECL Plus western blotting substrate (Pierce). The signals were 
quantified using the ImageJ software** and normalized to the band intensities 
observed in a coomassie-stained SDS-PAGE gel replicate having 1.3 1g of each 
protein preparation. 

ClpCP in vitro degradation assays. In vitro degradation assays containing 0.16 1M 
ClpC (hexamer), 0.16,1.M ClpP (heptamer) and 51M B-casein substrate were 
performed in 25 mM Tris, pH 7.5, 150mM NaCl, 20mM MgCl, and 5mM ATP 
at 30°C. A different 3-casein concentration (10|1.M) was used, for better resolu- 
tion, when comparing ClpCP and ClpCP*“ in MecA-dependent degradation. 
Small molecule compounds, for example, phospho-t-arginine (pArg4, Toronto 
Biochemicals), L-arginine (Sigma) or sodium phosphate, pH 7.5, were added at 
1mM. Time-point aliquots were mixed with denaturing SDS-PAGE sample buffer 
containing 100 mM EDTA to stop the reaction and analysed by SDS-PAGE. The 
resulting gels were stained with InstantBlue dye (Expedeon) and quantified using 
Image]*®. Supplementary Fig. 2 shows full SDS-PAGE gels and corresponding 
quantifications of Figs 2-5. 

Radioactive kinase assay. The radioactive kinase assay was performed at 19°C in 
25mM Tris, pH 7.5, 50mM NaCl, 2% glycerol, and 20 mM MgCh. 15 41M 8-casein 
substrate was incubated with 1 {1M McsB and/or other proteins (1 }1M McsA, 511M 
YwIE) for 0, 30 and 120 min. The reaction was started by adding ATP (spiked with 
[-°°P] ATP from Perkin Elmer) to a final concentration of 10mM and stopped with 
denaturing SDS-PAGE sample buffer. After resolving the samples by SDS-PAGE, 
phosphorylation was visualized using phosphoimager technology (GE Healthcare 
Life Sciences). 

ATPase assays. ATPase activity was determined by a coupled enzymatic reaction’”. 
0.125-0.5 1M ClpC was incubated with 18.75 U ml! pyruvate kinase, 21.45 U ml! 
lactate dehydrogenase, 0.2-0.3 mM NADH, 7.5mM phosphoenolpyruvate and 
2mM ATP in 20mM HEPES, pH 7.5, 100mM NaCl, and 5mM MgCh. Further 
assay proteins (MecA, McsB, McsA and (3-casein) were added at 4—6-fold excess 
over ClpC. The absorption at 340 nm (A340 nm) was recorded for 60 min using a 
Synergy H1 Multi-Mode Reader. The molar ATPase activity (v) was calculated 
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by the equation: v= AA349 nm/(path length x 6,220 x [ClpC] x M7! x cm7?). All 
activity data represent a minimum of three independent experiments and the 
variability is highlighted as standard deviation. 

ITC. ITC measurements were performed using VP-ITC (Microcal). Ligands (pArg, 
L-arginine, phospho-t-tyrosine (pTyr™“, from Sigma), and sodium phosphate, 
pH 7.5, were prepared at 0.3-1.4mM in 25mM Tris, pH 8, 50mM NaCl, 0.2mM 
TCEP and titrated to a 20\1M ClpC?W’® (full-length) or 140\1M ClpCN?? protein 
solution present in the same buffer. The same set-up was used for the analysis of 
ClpANT. The following settings were used: 5 1] (first) and 10 11 (all subsequent 
injections) injection volume, 300s spacing time between the injections, 300 r.p.m. 
stirring speed, 25°C temperature and overflow mode. Control experiments were 
carried out to correct for dilution effects upon protein/ligand titration. Resulting 
data were analysed with the MicroCal ORIGIN software. 

Structure determination of B. subtilis ClpCN™ in complex with pArg*4. 
Crystals of a ClpCN'?-pArg““ complex were obtained by the sitting-drop vapour 
diffusion method upon mixing 100 nl of reservoir with 200 nl of a ClpCN?? (2 mM) 
protein solution containing 2mM pArg*“. The optimized crystallization solution 
contained 13.5% (w/w) polyethylene glycol 4000, 500 mM ammonium sulfate, and 
100 mM sodium acetate at pH 5. Crystals formed overnight at 19°C and were 
soaked/cryo-protected in 40% polyethylene glycol 400, 20 mM Tris pH 8, and 6mM 
pArg“ before being flash-frozen. Diffraction data to 1.6 A were collected at 100K 
using a wavelength of 0.9763 A at beamline P14, DESY, Hamburg and integrated 
with XDS*8, Molecular replacement in Phaser’? using ClpCN?” from B. subtilis 
(PDB code 2Y1Q; ref. 29) as a search model yielded a high-confidence solution. 
Refinement in CNS, automatic rebuilding in Phenix?!, and manual rebuilding 
in Coot*”°? were carried out, followed by placing of ligands in Coot. N(omega)- 
phospho-L-arginine structure and constraints were obtained with the respective 
SMILES code using eLBOW*>. Rounds of refinement in Phenix” and rebuilding 
in Coot yielded the final model with good statistics and geometry (Extended Data 
Table 2 and the following Ramachandran statistics: 98% favoured, 2% allowed, 
0% outliers, 0% rotamer outliers). The featured-enhanced map, which is based 
on a composite residual omit map, was used to show ligand density. Figures were 
produced in Pymol*”, 

Purification of B. subtilis protein aggregates. For the phosphoproteomic analysis 
of B. subtilis protein aggregates, a 3-1 culture of B. subtilis was grown at 37°C 
until late exponential phase and then heat-shocked (50°C) for 45 min. Cells were 
collected by centrifugation, resuspended in 25 mM Tris, pH 7.5, 150mM NaCl, 
0.5% Triton X-100 and stored at —80°C. 

A 30-ml cell suspension in 25 mM Tris, pH 7.5, 150mM NaCl, 0.5% Triton 

X-100 was incubated on ice with 3 mg ml“ lysozyme, Complete protease inhibitor 
cocktail (Roche), 201g ml~! DNase, 0.2mM PMSF and 2 mM vanadate for 30 min. 
After dilution to 100 ml, cells were gently lysed at 4°C by French Press (Constant 
Cell Disruption Systems) at 1.7 kbar. Lysis efficiency, estimated by plating out serial 
dilutions, exceeded 99%. The lysates were centrifuged at 45,000g for 30 min. The 
resulting pellets, containing insoluble protein aggregates, were further washed with 
20 ml lysis buffer containing 0.4mg ml“! lysozyme, 101g ml? DNase, 0.2mM 
PMSF and 2mM vanadate. After 40 min homogenization at 4°C under gentle agi- 
tation, the pellets were re-centrifuged. The protein aggregates contained in the 
pellets were then solubilized in 7 ml 25 mM Tris, pH 7.5, 8 M urea by sonication. 
The samples were again centrifuged to separate the urea-solubilized protein aggre- 
gates from cell debris. The resulting supernatants were stored at —80°C until MS 
sample processing. 
Thermotolerance assay. To test the role of ClpC during heat stress, the following 
B. subtilis strains were investigated: wild type + pHCMC05, AclpC::tet 
+ pHCMO05, AclpC::tet + pHCMC05-ClpC and AclpC::tet + pHCMC05- 
ClpC*. For all experiments, cultures were grown at 37°C in LB media containing 
10g ml“! chloramphenicol and 0.2 mM IPTG. After reaching exponential phase, 
the cultures were transferred to a pre-warmed incubator at 53°C for 0, 30, 60 
or 120 min, respectively. After heat stress, the samples were diluted sequentially 
and transferred to LB plates. To compare the survival rate after heat stress, we 
determined the number of colony-forming units (CFU). All experiments were 
independently performed three times and the observed variability is highlighted 
as standard deviation. 


Native mass spectrometry. Native mass spectrometry experiments were carried 
out on a Synapt G2Si instrument (Waters) with a nano-electrospray ionization 
(nESI) source. Mass calibration was performed by a separate infusion of Nal cluster 
ions. Solutions were ionised through a positive potential applied to metal-coated 
borosilicate capillaries (Thermo Scientific). }-casein samples (51M) were sprayed 
from 25 mM ammonium acetate, pH 6.8. The temperature settings were capillary 
voltage 1.5kV, sample cone voltage 30 V, extractor source offset 46 V, and source 
temperature 50°C. Data were processed using Masslynx V4.1 software. 

Data reporting. Source Data for Figs 1-5 are provided in the Supplementary 
Information. No statistical methods were used to predetermine sample size. The 
experiments were not randomized, and investigators were not blinded to allocation 
during experiments and outcome assessment. 
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Extended Data Figure 1 | Development and validation of the ClpP*T®4? 
mutant. a, Co-NTA purification of wild-type and TRAP mutant 
ClpP(6His) expressed in heat-stressed B. subtilis wild-type cells. SDS- 
PAGE (left) reveals co-purification of two ClpP protein species. MALDI- 
MS analysis (bottom) estimates that the mass difference between the 

two ClpP species was 1,036 and 1,042 Da for wild-type and TRAP ClpP, 
respectively. These values fit to the expected mass of the 6His tag (1,065 Da), 
indicating that the purified proteins correspond to recombinant (tagged) 
and endogenous ClpP. Native-PAGE (right) shows that the two proteins 
are present predominantly as a composite, heptameric complex. Therefore, 
ClpP(6His)‘*“? expression in B. subtilis under heat-stress conditions 

is complicated by the formation of mixed complexes with endogenous 
(active) ClpP (cartoon). As the untagged endogenous ClpP was observed 
at similar levels as ClpP(6His), it is unlikely that efficient substrate- 
trapping complexes—built up exclusively by inactive ClpP™®4’—are 
formed in vivo. b, Engineering of the ClpP cross mutant. Crystal 

structure of the ClpP heptamer (PDB code 3KTI; ref. 58) shown in top 
(left) and side (right) view. Zoomed-in picture shows interaction between 
Arg142 and Glu119 of two neighbouring subunits. To avoid the interaction 
of the recombinant ClpP(6His) trapping mutant with endogenous ClpP, 
these two residues were inter-exchanged (Glul119Arg/Arg142Glu), 


6c 
ClpP monomer 
— (22.7 kDa) 


thus leading to an electrostatic repulsion between the cross-mutant 
ClpP(6His)*T84? and wild-type ClpP, while allowing formation of the 
respective homo-heptamers, as schematically indicated. c, Co-NTA 
purification of ClpP(6His)* and ClpP(6His)*T*4” expressed in heat- 
stressed B. subtilis wild-type cells. SDS-PAGE (left) shows that the 

X variants do not co-purify with endogenous ClpP, demonstrating that 
the Glu119Arg/Arg142Glu mutation prevents the formation of hetero- 
oligomers (see cartoon). Native-PAGE analysis (middle) suggests that the 
ClpP(6His)* protein has a reduced heptamerization propensity. However, 
the inactive version (Ser98Ala, ‘TRAP’) of the ClpP(6His)* mutant is 
present predominantly as a heptameric complex, probably owing to the 
stabilization by trapped substrates. ClpP(6His)*1*4? thus represents a 
tool to trap substrates in the wild-type background of B. subtilis. Our 
experimental approach has the advantage of avoiding the use of the 
AclpP B. subtilis strain, which has an extremely pleiotropic phenotype 
(including increased levels of McsB and McsA") that would largely bias 
the characterization of ClpP substrates. The ClpP* protein represents the 
ideal negative control: it has reduced ability to heptamerize and therefore 
to degrade proteins, and its overexpression is expected to have little effect 
on the overall levels of protein degradation and consequently on the 
abundance of substrate proteins. 
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Experiment 1: ClpP trapping mutant pull-downs in B. subtilis wt 
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Experiment 2: ClpP trapping mutant pull-downs in B. subtilis wt vs. AclpC 
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0 characterization of ClpP*T®4? total purification. b, SDS-PAGE (top) and native-PAGE (bottom) analysis 
eriments. a, SDS-PAGE (left) and native- of co-NTA purifications of experiment 2: pull-down of His-tagged ClpP 


PAGE (right) analysis of co-NTA purifications of experiment 1: pull-down _ variants in the wild-type (left) and AclpC (right) B. subtilis strain. Each 
of His-tagged ClpP variants in wild-type B. subtilis. Numbers denote sample represents approximately 5% of the total purification. 
biological replicates. Each sample represents approximately 10% of the 
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Extended Data Figure 3 | ITC binding data. For each binding study, the analysed interactions are indicated on the top, and the respective Ky values, 
when detected, are shown below. For reference, a structural alignment of the ClpC (grey) and ClpA (purple; PDB code 1K6K) NTDs, showing a high 
degree of structural similarity, is presented. 
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Extended Data Figure 4 | Preparation and validation of casein?A'® 

as a model substrate of ClpCP. a, Top, size exclusion chromatography 
(SEC) separation of casein?“"8 from the B. subtilis McsBA complex after 
in vitro phosphorylation. The red markings indicate the fractions that 
were pooled. Bottom, SDS-PAGE analysis of the fractions indicates 

that the SEC procedure could separate at least 95% of the McsB protein 
from the pooled 6-casein fractions. b, Evaluating the effect of the McsB 
contamination on the degradation of casein?“'8. Top, ClpCP in vitro 
degradation assay towards untreated 3-casein. Middle, ClpCP in vitro 
degradation of casein?“'8, with and without additional McsB kinase. 
Bottom, addition of either inactive or active McsB did not have an effect 
on the initial degradation rate of casein?4"S, Of note, the degradation of 
untreated casein in the presence of McsBA is slightly delayed compared 
to the degradation of pre-modified casein? “'8, ¢, An alternative, improved 
protocol for purifying casein?“"8. After in vitro phosphorylation of 
B-casein by G. stearothermophilus McsB(6His), a Ni-NTA column was used 
to capture the tagged kinase. The flow-through fraction was then applied 


to a SEC column to further separate remaining McsB from }-casein. 

Two different fractions of the 8-casein peak were collected, the later-eluting 
one (yellow) having a higher degree of purity in relation to the earlier 

one (orange). d, Activation of the ClpC ATPase activity by the different 
casein“'8 preparations (2|.M concentration each). Each fraction contains 
a different (substoichiometric) amount of the McsB contamination. 
ATPase measurements reveal an almost identical activation of ClpC for 
all fractions, indicating that the residual amounts of McsB present in 

the casein?“'8 sample do not contribute to ClpC activation. Error bars 
denote standard deviation of technical triplicates. e, To obtain substrate 
samples varying in the amount of arginine phosphorylation, casein was 
pre-incubated with McsBA for increasing times. The YwlE arginine 
phosphatase was added to the sample phosphorylated most strongly 

(120 min incubation with McsBA) as a negative control. The resultant 
casein?4"s samples were subjected to pArg immunoblots using a pArg- 
specific antibody” (right, top) and to ClpCP degradation assays (right, 
bottom). 


© 2016 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


ARTICLE 


Extended Data Figure 5 | Binding pockets of pArg, pTyr and pSer/Thr. 
The binding pockets are shown as surface representation and coloured 
according to their electrostatics as calculated with PyMol (blue: positive, 
red: negative). Bound phosphoamino acids are presented as sticks with 
nitrogens and oxygens coloured blue and red, respectively. a, b, pArg- 
binding sites 1 and 2, respectively, of the CIpCN'? domain. The sites are 
characterized by a ‘bipolar’ architecture with both a positive and a negative 


area, jointly required to recognize a pArg side chain. c, d, pTyr-binding 
site of the Src SH2 domain (PDB code 1SP§; ref. 59) and pSer/Thr-binding 
site of the 14-3-3 domain (PDB code 1QJB; ref. 60). Both pTyr and pSer 
were part of a peptide but are shown in isolation for clarity. In contrast to 
the pArg-binding site, pTyr- and pSer/Thr-specific pockets are uniformly 
positively charged. 
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Extended Data Figure 6 | Sequence alignment of the pArg-binding site sites comprises Glu and Thr from one symmetrical region and Arg and 
of ClpC from different species and of the homologous regions of other Thr from the other. The alignment shows high conservation of the critical 
Clp ATPases. The two symmetrical regions (comprising residues 6-68 and __ residues of ClpC proteins from different McsB-containing bacteria 
80-142, approximately) of each protein are aligned. Residues interacting (B. subtilis, Listeria monocytogenes, Staphylococcus aureus, Bacillus 
with the pArg molecule (the Glu residue binding to the guanidinium anthracis and Peptoclostridium difficile). Conversely, the residues are not 
group and the Arg/Thr residues interacting with the phosphate) are conserved in related Clp proteins (ClpA and ClpB) from McsB-deficient, 
circled in black and marked by an arrow. Each of the two pArg-binding Gram-negative bacteria. 
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Extended Data Figure 7 | Intact mass analysis of McsB-treated 

and untreated (control) (3-casein. a, Unprocessed MS spectra. 
Zoomed view of the 13+ charge state species shows two predominant 
arginine phosphorylation states in the McsB-treated sample (top): 


24100 


1 phopshorylation (diamond, m/z 1,852) and 2 phosphorylations (triangle, 


2420611 
2430613 9934640 


mas 


24150 24200 24250 24300 24350 24800 24850 24500 24550 24600 24650 


m/z 1,852), while only the non-pArg form (circle, m/z 1,848) can be 
visualized in the untreated control (bottom). b, Deconvoluted MS 

spectra, showing the average proportion of unmodified (circle, 23,982 Da), 
1 pArg-containing (diamond, 24,063 Da) and 2 pArg-containing (triange, 
24,142 Da) 6-casein. 
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Extended Data Table 1 | Summary of identified phospho-sites in various ClpP-trapping experiments 
Experiment 1 
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Protein Descriptions contra TRAP 
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Putative zinc protease AlbF R17 
2-oxoisovalerate dehydrogenase subunit b 
Cu-sensing transcriptional repressor CsoR R24 
Transcriptional regulator CtsR R55 
Incsose dehydratase R216 
Alpha-galactosidase R56 
Mn transport ATP-binding prot. MntB R132, (R73) 
2-oxoglutarate dehydrogenase E2 
50S ribosomal prot. L14 R17 
50S ribosomal prot. L22 R11 
30S ribosomal prot. S8 R47, R72, R79 
Surfactin synthase subunit 1 R226 
Elongation factor Tu R384 
Putative adenine deaminase YerA 
Uncharact. HTH-type transcript. regulator YkoM 
Uncharactenized prot. YkyB 
Uncharacterized prot. YpiF R4 


Uncharact. transcript. regulatory protein YvfU 


Phosphorylated proteins, likely pArg 


Transcriptional regulatory prot. DegU 
Heat4inducible Transcriptional repressor HrcA 


(R27) 


Ser/Thr/Tyr-phosphorylated proteins 


ComG operon prot. 1 
Branched-chain aa transaminase 1 7249 
PTS system mannitol-specific ElICB S365 


50S ribosomal prot. L35 

30S ribosomal prot. $19 

Probable L-serine dehydratase, a chain 

Phage-ike element PBSX prot. XkdO 
Uncharacterized prot. YceH 

Sporulation prot. YunB T165 
Uncharactenized prot. YvyG 

Uncharacterized prot. YxiB 
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Extended Data Table 2 | Data collection and refinement statistics 


ClpC NTD 
bound to pArg 
(PDB:5HBN) 
Data collection 
Space group P65 
Cell dimensions 
a, b,c (A) 84.6, 84.6, 29.9 
a Bx) 90, 90, 120 
Resolution (A) 42.3-1.60 (1.66-1.60)* 
Rinerge 0.062 (0.55) 
Rineas 0.064 (0.58) 
Roim 0.015 (0.17) 
I/o() 29.62 (4.73) 
CCip 0.999 (0.95) 
Completeness (%) 0.98 (0.83) 
Redundancy 18.6 (11.0) 
Refinement 
Resolution (A) 42.,3-1.60 (1.66-1.60) 
No. reflections 16,041 (1,349) 
Rwork / Rete 0.145 (0.195)/ 0.166 (0.253) 
No. atoms incl. hydrogens 2,503 
Protein 2,347 
Ligands and ions 116 
Water 40 
B factors 28.7 
Protein 28.1 
Ligand/ion 35.1 
Water 36.3 
R.m.s. deviations 
Bond lengths (A) 0.009 
Bond angles ©) 0.97 


“Values in parentheses are for highest-resolution shell. 
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Extended Data Table 3 | Summary of protein expression and purification conditions 


Protein. E ion _BufferA Buffer B 

Bs ClpC 37°C, 3h 25 mM Tris pH 7.5, 300 mM NaCl 25 mM Tris pH 7.5, 300 mM NaCl 

Bs ClpC°¥® = 37°C, 3h 25 mM Tris pH 8, 300 mM NaCl 25 mM Tris pH 7.5, 300 mM NaCl 

BsClpCX™ 37°C, 2h 25 mM Tris pH 8.5, 150 mM NaCl, 1 mM TCEP 5 mM Tris pH 8.5, 50 mM NaCl, 1 mM TCEP 
EcClpAN 37°C, 2h 25 mM Tris pH 8.5, 150 mM NaCl, 1 mM TCEP 5 mM Tris pH 8.5, 50 mM NaCl, 1 mM TCEP 
Bs ClpP 37°C, 3h 25 mM Tris pH 7.5, 150 mM NaCl, 10% glycerol 25 mM Tris pH 7.5, 150 mM NaCl, 10% glycerol 
Bs McsA 20°C, 5h 25 mM Tris pH 7.5, 300 mM NaCl, TCEP 1 mM 25 mM Tris pH 7.5, 100 mM NaCl, TCEP 1 mM 
Bs McsB 18°C, O/N 25 mM Tris pH 7.5, 300 mM KCl 25 mM Tris pH 7.5, 50 mM KCl 

Bs McsB®4 18°C, O/N 25 mM Tris pH 7.5, 300 mM KCl 25 mM Tris pH 7.5, 50 mM KCl 

Gs McsB 37°C, 3h 25 mM Hepes pH 7.5, 300 mM KCl 25 mM Hepes pH 7.5, 100 mM KCl 

Gs McsB!”4 18°C, O/N 50 mM Tris pH 7.5, 50 mM NaCl 50 mM Tris pH 7.5, 50 mM NaCl 

Bs MecA 25°C, 5h 25 mM Hepes pH 8, 300 mM KCl 25 mM Hepes pH 8, 300 mM KCl 

Gs YwlE 37°C, 3h 25 mM Tris pH 7.5, 300 mM NaCl 25 mM Tris pH 7.5, 100 mM NaCl 

Gs YwIE™8N 37°C, 3h 25 mM Tris pH 7.5, 300 mM NaCl 25 mM Tris pH 7.5, 100 mM NaCl 


Bs, Bacillus subtilis; Ec, Escherichia coli; Gs, Geobacillus stearothermophilus; O/N, overnight. 
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Break-induced telomere synthesis 
underlies alternative telomere maintenance 


Robert L. Dilley!, Priyanka Verma!, Nam Woo Chol, Harrison D. Winters!, Anne R. Wondisford! & Roger A. Greenberg!” 


Homology-directed DNA repair is essential for genome maintenance through templated DNA synthesis. Alternative 
lengthening of telomeres (ALT) necessitates homology-directed DNA repair to maintain telomeres in about 10-15% 
of human cancers. How DNA damage induces assembly and execution of a DNA replication complex (break-induced 
replisome) at telomeres or elsewhere in the mammalian genome is poorly understood. Here we define break-induced 
telomere synthesis and demonstrate that it utilizes a specialized replisome, which underlies ALT telomere maintenance. 
DNA double-strand breaks enact nascent telomere synthesis by long-tract unidirectional replication. Proliferating cell 
nuclear antigen (PCNA) loading by replication factor C (RFC) acts as the initial sensor of telomere damage to establish 
predominance of DNA polymerase 6 (Pol 5) through its POLD3 subunit. Break-induced telomere synthesis requires the 
RFC-PCNA-Pol 6 axis, but is independent of other canonical replisome components, ATM and ATR, or the homologous 
recombination protein Rad51. Thus, the inception of telomere damage recognition by the break-induced replisome 


orchestrates homology-directed telomere maintenance. 


Tremendous progress has been made in identifying the events respon- 
sible for recognizing and repairing DNA double-strand breaks (DSBs)". 
A complex aspect of this response is homology-directed DNA repair 
(HDR), which can involve numerous possibilities to capture homolo- 
gous regions of the genome to use for templated DNA synthesis and 
repair. The detailed order of molecular events that ensues after the initial 
sensing of DSBs to allow the execution of homology-directed synthesis 
remains enigmatic. Specifically, how the DNA damage response coor- 
dinates productive interactions between DNA replication complexes 
to perform break-induced DNA synthesis has not been extensively 
demonstrated in mammalian cells. ALT is a clinically relevant example 
ofa DNA repair pathway that requires homology-directed synthesis 
to maintain telomeres in ~10-15% of human cancers”. Additionally, 
such synthesis could represent an attractive therapeutic target against 
cancers, especially if it proves to be different from canonical S-phase 
replication. 


Telomere breaks stimulate long-tract synthesis 
To study homology-directed synthesis at ALT telomeres, we devel- 
oped a bromodeoxyuridine (BrdU) pulldown approach to isolate and 
quantify nascent telomeres synthesized following telomere-targeted 
DSBs generated by the fusion of the Shelterin component TRF1to the 
FokI endonuclease (Fig. la). Using stable ALT-positive U2OS cell lines 
expressing TRF1-FokI under tetracycline-control, a 2-h damage induc- 
tion with wild-type TRF1-FokI, but not the FokI(D450A) nuclease-null 
mutant, resulted in a ~10-fold increase in nascent telomere synthesis 
in asynchronous and G2-enriched cells (Fig. 1b, c and Extended Data 
Fig. la-g). Concurrent synthesis of nascent C- and G-rich telomere 
strands was evident from 1h post-induction and became maximal at 
~2h (Fig. 1d). Nascent Alu repeat DNA was not increased by TRF1- 
FokI expression, demonstrating the specificity of break-induced 
telomere synthesis (Fig. 1d). 

To understand the nature of individual DNA synthesis events, we 
adapted the single-molecule analysis of replicated DNA (SMARD) 
technique for studying break-induced telomere synthesis*». After 


induction of TRF1-FokI, U2OS cells were sequentially incubated with 
iododeoxyuridine (IdU) and chlorodeoxyuridine (CIdU), genomic 
DNA was digested, and telomere fragments were isolated on the basis 
of size (Fig. le). The percentage of telomeres with IdU/CIdU incorpo- 
ration increased with the duration of TRF1-FokI induction (Fig. 1f, g). 
Break-induced telomere synthesis proceeded in a unidirectional fashion, 
often to the end of the telomere fragment. Nascent telomere tracts ranged 
in length from 5 to 70 kilobases (kb), with a median value of 19.8 kb 
(n= 46) that matched the median length of the overall telomere fibres 
observed (20.1 kb; n = 45) (Fig. 1h). Furthermore, ~80% of nascent 
telomere fragments were completely labelled and ~98% of nascent 
fragments had label on at least one of the ends. Taken together, these 
data suggest that DSBs at ALT telomeres induce long-tract telomeric 
DNA synthesis. 

As a complementary approach, BrdU immunofluorescence at 
telomeres provides a means to assess spontaneous synthesis of ALT 
telomeres. Cell lines that utilize ALT, but not telomerase, displayed 
elevated BrdU incorporation at telomeres in a pattern distinct from 
S-phase replication (Extended Data Fig. 2a, b, d), consistent with previ- 
ous reports®. TRF1-Fokl expression increased analogue incorporation 
at ALT telomeres in interphase and metaphase cells (Extended Data 
Fig. 2c, e), suggesting that telomere damage may be an initiating event 
for spontaneous ALT telomere synthesis’. 

Expanding on our observations, we generated a panel of TRF1-FoklI 
inducible lines from cells that either utilize ALT (U2OS, VA13, SKNFI) 
or telomerase (HeLa 1.3, HeLa S3, 293T) for telomere maintenance. 
Notably, all lines showed evidence of break-induced telomere synthesis 
by BrdU pulldown upon induction with TRF1-Fokl (Extended Data 
Fig. 3a—c). This holds true not only across telomere maintenance mech- 
anism, but also regardless of ATRX status, overall telomere length dif- 
ferences, and cell type (Extended Data Fig. 3a). Notably, a recent study 
provided evidence that replication stress can activate ALT mechanisms 
in primary and telomerase-positive cells®. We propose that although 
any cell may have the capacity for break-induced telomere synthesis, 
ALT-positivity entails greater levels of telomere damage that promotes 
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Pennsylvania, 421 Curie Boulevard, Philadelphia, Pennsylvania 19104, USA. 


54 | NATURE | VOL 539 | 3 NOVEMBER 2016 


© 2016 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


a 1. Induce TRF1-Fokl b c 
2. Pulse cells with BrdU 2 h 
3. Isolate genomic DNA U20S: WT WT D450A WT 15 * wt 
4. Shear to 100-300 bp 3 _ 
5. IP with anti-BrdU antibody ings a 7 i ze C1D450a 
6. Probe for telomeres BrdU: + fe a Ss 2 40 
oO 
Nascent C strand iG = 
o 
Br Br Br 9 = 5 
UCCCAAUCCCAAUC.... cy 
—TtAceeTTacecTracca — BrdU Pa 
Nascent G strand we 10% 0 
oS Oo 
GrBrerge Input oe any 
ial AGGGUUAGGGUUAG. 
— AATCCCAATCCCAATCCC — 
d 10 @ U20S cells 
-e C strand induced 
eecG sand with TRF1-Fokl IdU Clau 
> + Alu —Stt—t—‘_ 
@ 2h 2h 
c 
® 
= 
: fon IT 
bp a 
T : T r T T 
Induction duration (h) 
g = 40 a 
2 20 gy Total tel (median = 20.1 + 8.5, n = 45) 
iS) 01 Nascent tel (median = 19.8 + 5.3, n = 46) 
=] 
3 g 15 
= 5 
3 3 
3 % 10 
3 5 
= £ 
g Zz 5 
a 
é A ae | 
3 ol JB of] O, 
E 5 15 25 35 45 55 65 75 85 
= Fibre length (kb) 


Figure 1 | Break-induced telomere synthesis occurs by long-tract 
unidirectional telomeric recombination. a, Schematic of BrdU pulldown. 
IP, immunoprecipitation. b-d, BrdU pulldown dot blot for telomere 
content using a **P-labelled telomere oligonucleotide (b) from U20S 

cells induced (Ind) with TRF1-FokI for 2h, with quantification (c) and 
time course of C- and G-rich telomere strands compared to Alu repeats 
(d). e, Schematic of telomere SMARD. f-h, Representative images (f) of 
telomere (blue) labelled with IdU (red) and CldU (green) from U20S 
cells induced (Ind) with TRF1-Fokl, with quantification (g) and length of 
telomere fibres (h). Median length quantified + 95% C.I. WT, wild-type; 
D450A, nuclease-null mutant. Data represent mean + s.e.m. of three (c) or 
two (g) independent experiments. **P < 0.01, *P<0.05. 


homology-directed DNA synthesis and telomere maintenance. 
Therefore, non S-phase telomere synthesis (Extended Data Fig. 2a, 
b, d) is apparent at baseline only in ALT cells. 


Break-induced telomere synthesis by alternative HDR 
Genetic studies in yeast demonstrated that break-induced replication 
is responsible for telomere recombination, which can proceed through 
Rad51-dependent and -independent mechanisms” !°. Rad51, together 
with the Hop2—Mnd1 heterodimer, localize to ALT-associated PML 
bodies (APBs) and facilitate long-range telomere movement and 
clustering in ALT cells”!®'!7. Cells lacking Hop2 from CRISPR-Cas9- 
mediated excision showed reduced telomere clustering, APB forma- 
tion, and telomere exchanges in ALT-positive VA13 cells (Extended 
Data Fig. 4a-f). ATR is a damage-sensing kinase that signals replica- 
tion stress and is important for ALT telomere integrity and cell sur- 
vival'®. Disruption of ATR and Chk] signalling by knockdown and 
small-molecule inhibitors reduced Hop2 recruitment to telomeres after 
TRF1-Fokl induced damage, whereas ATM disruption had no effect 
(Fig. 2a). Similarly, ATR knockdown restricted telomere mobility after 
TRF1-FokI induction in U2OS cells (Fig. 2b), thus implicating ATR 
and Rad51-Hop2 as critical for ALT telomere mobility. 

We next asked whether ATR and Rad51-Hopz2 are required for 
break-induced telomere synthesis. Surprisingly, ATR, Rad51, and 
Hop2 were all dispensable for synthesis. Conversely, knockdown of 
each gene paradoxically increased levels of nascent telomeres, which 
held true over an 8-h time course (Fig. 2c, Extended Data Fig. 5a). 
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Figure 2 | Break-induced telomere synthesis occurs by alternative 
HDR and utilizes a non-canonical replisome defined by Pol 5. 

a, Quantification of co-localized Hop2 and telomere foci in TRF1-FokI 
expressing VA13 cells treated with the indicated small interfering RNAs 
(siRNAs) or inhibitors. b, Mean squared displacement (MSD) analysis 
of live-cell telomere movement from U20OS cells following TRF1-FokI 
induction. c-f, BrdU pulldown dot blots for telomere content 

(c, d, e, f) from U2OS cells induced (Ind) with TRF1-FokI for 2h, with 
rescue experiment (e), and quantifications (c, d, f). g, Quantification of 
telomere SMARD from U20OS cells induced with TRF1-FokI and treated 
with the indicated siRNAs. EV, empty vector; WT, reconstituted POLD3. 
Data represent mean + s.e.m. of at least two independent experiments. 
*** P< 0.001, **P< 0.01, *P < 0.05. 


Similarly, spontaneous ALT telomere synthesis did not require Rad51 
(Extended Data Fig. 5e, i). To investigate the long-term consequences 
of depletion of this pathway, we examined the telomere length of VA13 
HOP2 CRISPR clones. All of the 6 clones lacked detectable Hop2 pro- 
tein expression, with no telomere shortening observed at approximately 
population doubling (PD) 25 or longer time points (Extended Data 
Fig. 4g, h). Collectively, this provides evidence for Rad51-independent 
mechanisms of mammalian break-induced telomere synthesis and ALT 
telomere maintenance. Although ATR regulates damage signalling, tel- 
omere integrity, and survival in ALT cells, our data suggest it is not 
an essential component of the break-induced replisome at telomeres. 


Break-induced telomere synthesis requires Pol 5 

We next surveyed the replisome dependencies of break-induced 
telomere synthesis. Replicative DNA polymerases Pol 6, Pol ¢, and 
Pol a-primase were previously implicated in yeast break-induced 
replication'!”. Pol 6 including the POLD3 and POLD4 accessory and 
POLD1 catalytic subunits, was required for break-induced telomere 
synthesis (Fig. 2d, e, Extended Data Fig. 5a, b, d). Unexpectedly, Pol & 
was required for synthesis of both C- and G-rich telomere strands, 
whereas Pol ¢ and Pol a-primase were dispensable as was the MCM2-7 
replicative helicase (Fig. 2d, f). Notably, depletion of POLD3 resulted 
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Figure 3 | Rapid loading of PCNA acts as the initial sensor of telomere 
damage. a, Quantification of co-localized GFP-tagged replisome subunits 
and telomere foci from U2OS cells induced with TRF1-FoklI for 2 h. 

b, Schematic of POLD3 and representative images of POLD3 deletion 
mutants. c, Quantification of co-localized endogenous PCNA and POLD3 
and telomere foci from U20OS cells induced with TRF1-FokI for 2 h. 

d, BrdU pulldown dot blot for telomere content from U2OS cells induced 
(Ind) with TRF1-FokI for 2h. e, Kinetics of PCNA and POLD3 loading at 
damaged telomeres in relation to Rad51, YH2AX, and DNA synthesis in 
U20S cells. IF, immunofluorescence. f, Representative live-cell imaging 
of PCNA recruitment to damaged telomeres before telomere clustering in 
U20S cells. Time in minutes shown in upper left corners. g, Quantification 
of the requirements for PCNA loading at damaged telomeres in U2OS 
cells. h, Quantification of spontaneous co-localized RFC1, PCNA, and 
POLD3 and telomere foci from a panel of ALT’ and ALT* cell lines. Fixed 
cell and live cell images were captured at 60 and 100 magnification, 
respectively. Data represent mean + s.e.m. of two (a) or three (c, e, g, h) 
independent experiments. ****P < 0.0001. 


in ~2.5-fold less incorporation of IdU/CIdU in telomere fibres after 
TRF1-FoklI-induced breaks (Fig. 2g). POLD3 is also part of the Pol C 
complex involved in translesion synthesis*”-””. However, the catalytic 
subunit of Pol ¢ (REV3L) as well as the other translesion synthesis pro- 
teins Pol 1 (POLH) and REV1 were not needed for break-induced tel- 
omere synthesis (Fig. 2d, Extended Data Fig. 5c). Therefore, the major 
function of POLD3 in break-induced telomere synthesis is through Pol 6. 
Notably, the requirements for break-induced telomere synthesis using 
TRF1-Fokl faithfully recapitulate the requirements for spontaneous 
ALT telomere synthesis. Specifically, non S-phase telomere synthesis in 
three ALT lines required POLD3/Pol 6, but occurred independently of 
Pole, Pol a, and Pol ¢ (Extended Data Fig. 5f-i). Collectively, these data 
define a non-canonical replisome involved in ALT telomere synthesis. 


RFC-PCNA is the initial sensor of telomere damage 

Pol 6 showed robust recruitment to TRF1—FokI damage sites in 
U20S cells, whereas Pol ¢, Pol a-primase, and MCM2-7 were pres- 
ent at much lower levels (Fig. 3a, Extended Data Fig. 6a, b). POLD3 
facilitates interaction of the Pol 6 complex with the PCNA clamp for 
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processive synthesis and strand displacement”. Notably, Pol 6 has 
higher affinity for PCNA than does Pol ¢, and PCNA is known to func- 
tion in repair processes outside S-phase*”>. Deletion of the PCNA- 
interacting peptide (PIP) box (A456-466) of POLD3 disrupted its 
recruitment to damage sites (Fig. 3b). Functionally, PCNA interaction 
with POLD3 facilitates recruitment of the whole Pol 5 complex to 
damaged ALT telomeres (Extended Data Fig. 6c, d). The RFC1-5 
clamp loading complex was required for PCNA-POLD3 telomere 
localization, whereas the alternative clamp loader subunit, Rad17 was 
dispensable (Fig. 3c). Furthermore, the entire axis consisting of RFC1- 
PCNA-POLD3 localized in an inducible fashion to ~90% of damaged 
ALT telomeres and was required for break-induced telomere synthesis 
(Fig. 3d, Extended Data Fig. 5h, 6e, f). 

Both PCNA and POLD3 showed ~10-fold increases in telomere 
localization by 30 min after induction with TRF1-Fokl (Fig. 3e). By 
contrast, Rad51 localization occurred more slowly and was maximal 
by 2h after induction (Fig. 3e). Peak telomere synthesis coincided 
with an increase in DSB signalling (Figs 1d, 3e). Time-lapse imag- 
ing revealed that GFP-PCNA localized to TRF1-FokI damage sites 
soon after they became visible and before ALT telomere merging 
events (Fig. 3f, n = 20 cells). Consistent with PCNA loading being an 
early damage response, its localization was independent of proximal 
damage response factors ATR, ATM, MRE11, or homologous recom- 
bination proteins Rad51, Hop2, and BRCA2 (Fig. 3g). Importantly, 
RFC1, PCNA, and POLD3 spontaneously localized to ~2-10% of tel- 
omeres specifically in ALT-positive cells, consistent with the presence 
of persistent damage at a subset of ALT telomeres'’ (Fig. 3h). These 
data reveal that PCNA loading is the initial damage sensor at ALT 
telomeres, thus establishing a platform to assemble the break-induced 
replisome. 


POLDS3 is critical for ALT telomere maintenance 

Pol32, the yeast homologue of POLD3, is required for recombination 
dependent survivors of telomerase defiency’”. Transient knockdown 
of POLD3 decreased spontaneous ALT telomere synthesis and single- 
telomere-exchange events by chromosome orientation fluorescence 
in situ hybridization (CO-FISH), but had no immediate effect on 
C-circles or telomere length (Extended Data Figs 5f-i, 7a, b). We 
investigated the consequences of prolonged POLD3 depletion on ALT 
telomere maintenance using CRISPR-Cas9 in U2OS cells. Although we 
were unable to generate surviving cells with complete loss of POLD3, 
we obtained 4 clones (cl—c4) with in-frame deletions and residual 
expression of POLD3 (Extended Data Fig. 7c-g). Notably, all 4 clones 
had reduced levels of the entire Pol 5 complex (Fig. 4a, Extended Data 
Fig. 7g), consistent with a stabilizing role for POLD3 (ref. 26). Clones 
cl-c3 displayed accelerated telomere shortening at ~PD 25 compared 
to the empty guide control, whereas clone c4 had a more minor phe- 
notype (Fig. 4b, Extended Data Fig. 7h, i). Telomere length in 5 clones 
with normal POLD3 expression (c5-c9) was unchanged at ~PD 25 
(Extended Data Fig. 8a). The telomere shortening observed in clones 
cl-c3 is greater than expected for cells lacking a telomere maintenance 
mechanism, representing a loss of ~800-1200 bp of telomeric repeats 
per cell division. However, telomeres did not continue to shorten over 
time in these clones. This is consistent with accelerated shortening and 
stabilization observed in other ALT lines in which telomere mainte- 
nance mechanisms have been partially impaired?’. Additional U2OS 
POLD3 CRISPR clones from an independent guide RNA also displayed 
shortened telomeres compared to the parental line or clones derived 
from the same guide RNA that failed to exhibit reduced POLD3 expres- 
sion, making it unlikely that the effects observed are due to clonal varia- 
tion (Extended Data Fig. 8c). Collectively, analysis of the mean telomere 
length of the 31 clones from both of the guide RNAs revealed a signif- 
icant decrease in clones with reduced POLD3 expression compared to 
those with normal POLD3 expression (Fig. 4c). In contrast, telomere 
length changes were not observed in 11 POLD3 CRISPR clones from 
telomerase-positive HeLa 1.3 cells (Extended data Fig. 8d-f), suggesting 
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an increased requirement of POLD3 for processive telomere synthesis repair mechanisms with differing kinetics of homology-directed 
during ALT. telomere synthesis” (Fig. 4f). We speculate that related processes are 
U20S POLD3 CRISPR clones accumulated increased numbers of _ invoked at other vulnerable regions of the genome””*’. The unique 
telomere dysfunction-induced foci (TIFs) (Fig. 4d, Extended Data characteristics that differentiate this mechanism from scheduled 
Fig. 7i). C-circles are another marker of telomere maintenance specific S-phase replication may facilitate a better understanding of how alter- 
to ALT-dependent cells*®. Clones cl-c3 had significantly decreased _ native repair mechanisms enable genome evolution and enhance cancer 
levels of C-circles that could be rescued by reconstituting wild-type _ cell fitness. 
rhe nie aaa ee Online Content Methods, along with any additional Extended Data display items and 
. ; Source Data, are available in the online version of the paper; references unique to 
partial disruption of ALT activity and telomere maintenance in clones __ these sections appear only in the online paper. 
cl-c3 (Extended Data Fig. 7i). We propose that POLD3 is critical for 
the majority of nascent telomere synthesis during ALT and therefore 
underlies long-term telomere maintenance and ALT activity. 
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METHODS 
Data reporting. No statistical methods were used to predetermine sample size. 
Cell culture. U2OS, HeLa 1.3, HeLa S3, DLD-1, and 293T cell lines were grown 
in DMEM (Thermo Fisher) with 10% calf serum and 1% penicillin/streptomycin. 
VA13, GM847, LM216T, and LM216J cell lines were grown in DMEM (Thermo 
Fisher) with 10% FBS and 1% penicillin/streptomycin. SKNFI cell line was grown 
in RPMI (Thermo Fisher) with 10% FBS and 1% penicillin/streptomycin. VA13 
cell line refers to WI-38 VA-13 subline 2RA. LM216T/J are matched lines. Cell 
lines were obtained from ATCC and tested negative for Mycoplasma using the 
MycoAlert PLUS Mycoplasma Detection Kit (Lonza). The U2OS TRF1-FokI 
inducible cell line was authenticated by STR analysis (ATCC). Other lines were 
validated by ALT characteristics. None of the cell lines used is listed as commonly 
misidentified by the International Cell Line Authentication Committee (ICLAC). 

ALT-positive lines used: U2OS, VA13, GM847, LM216J, SKNFI 

ALT-negative lines used: HeLa 1.3 (long telomere), HeLa $3, DLD-1, LM216T, 
293T 
BrdU immunofluorescence. Cells were pulsed with 100\1M BrdU (Sigma) for 
2h before fixation. After permeabilization, cells were denatured with 500 U ml! 
DNasel (Roche) in 1x reaction buffer (20mM Tris-HCl (pH 8.4), 2mM MgCh, 
50mM KCl in PBST) for 10-25 min at 37 °C in a humidified chamber. Coverslips 
were then washed and incubated with anti-BrdU antibody (BD) for 20 min at 
37°C followed by secondary antibody and telomere FISH. For metaphases, cells 
pulsed with BrdU were treated with 100 ng ml! colcemid for 90 min followed by 
75mM KCI for 30 min. Cells were fixed in 3:1 methanol:acetic acid, dropped onto 
slides, and allowed to dry overnight. Denaturation was performed with 2 N HCl for 
30 min at room temperature followed by antibody incubations as described above. 
BrdU pulldown dot blot. BrdU pulldown was adapted from a published pro- 
tocol**, Cells were pulsed with 100|.M BrdU (Sigma) for 2h before collection. 
Genomic DNA (gDNA) was isolated using phenol-chloroform extraction followed 
by resuspension in TE buffer. gDNA was then sheared into 100-300 bp fragments 
using a Covaris $220 sonicator. 1-4 |1g sheared gDNA was denatured for 10 min at 
95°C and cooled in an ice-water bath. Denatured gDNA was incubated with 21g 
anti-IgG (Sigma) or anti-BrdU antibody (BD) diluted in immunoprecipitation 
buffer (0.0625% (v/v) Triton X-100 in PBS) rotating overnight at 4°C. The next 
day, samples were incubated with 30 11 Protein G magnetic beads (Pierce) that 
had been pre-bound to a bridging antibody (Active Motif) for 1h rotating at 4°C. 
Beads were subsequently washed three times with immunoprecipitation buffer 
and once with TE buffer. Beads were then incubated twice in elution buffer (1% 
(w/v) SDS in TE) for 15 min at 65°C. Pooled eluate was cleaned with ChIP DNA 
Clean & Concentrator kit (Zymo). Samples, along with 10% inputs, were diluted 
into 2 SSC buffer, treated at 95°C for 5 min, and dot-blotted onto an Amersham 
Hybond-N* nylon membrane (GE). The membrane was then denatured in a 
0.5N NaOH 1.5M NaCl solution, neutralized, and ultraviolet crosslinked. The 
membrane was hybridized with **P-labelled (TTAGGG)g oligonucleotides, unless 
otherwise noted, in PerfectHyb Plus Hybridization Buffer (Sigma) overnight at 
37°C. The next day, the membrane was washed twice in 2x SSC buffer, exposed 
onto a storage phosphor screen (GE Healthcare) and scanned using STORM 860 
with ImageQuant (Molecular Dynamics). All quantifications were performed in 
Fiji and normalized to 10% input. 
Telomere single-molecule analysis of replicated DNA (SMARD). The SMARD 
assay was performed as previously described*®. U2OS cells were induced with 
TRF1-FokI for 20 min or 2h and were subsequently labelled by incubating with 
301M IdU for 2h, followed by 30\1M CIdU for the next 2h. After pulsing, 10° 
labelled cells per condition were embedded in 1% agarose and lysed using deter- 
gents (100mM EDTA, 0.2% sodium deoxycholate, 1% sodium lauryl sarcosine 
and 0.2 mg ml’ Proteinase K). The plugs were then washed several times with 
TE, treated with 100|1M PMSE and then washed again with TE buffer followed 
by incubation with 1 x Cut-Smart buffer (NEB) for 30 min. The DNA in the plugs 
was then digested overnight at 37°C using 50 U of both Mbol and Alul (NEB) per 
plug. The digested plugs were then cast into a 0.7% low-melting point agarose gel 
and a distinct fragment running above 10 kb (containing telomeric DNA defined 
by Southern blotting) was excised, melted and stretched on slides coated with 
3-aminopropoyltriethoxysilane (Sigma-Aldrich). After denaturation of the DNA 
strands using alkali buffer (0.1 M NaOH in 70% ethanol and 0.1% 8-mercaptoeth- 
anol), the DNA was fixed using 0.5% glutaraldehyde and incubated overnight 
with biotin-OO-(CCCTAA), locked nucleic acid (LNA) probe (Exiqon) at 37 °C. 
Telomere FISH probes were then detected using the Alexa Fluor 405-conjugated 
streptavidin (Thermo-Fisher) followed by sequential incubation with the biotinylated 
anti-avidin antibody (Vector Laboratories) and additional Alexa 405-conjugated 
streptavidin. IdU and CldU were visualized using mouse anti-IdU (BD) and rat 
anti-CIdU (Serotec) monoclonal antibodies followed by Alexa Fluor 568-goat anti- 
mouse and Alexa Fluor 488-goat anti-rat secondary antibodies (Life Technologies). 
Images were acquired using the NIS-element software (Nikon) and a Nikon eclipse 
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80i microscope equipped with a 63 x objective and a Cool Snap camera (MYO). 
For calculating the length of the telomeres and replication tracts, the line-scan 
function from Image J was used. For conversion of microns to kilobases, as 10 bp 
(equals one turn of the helix) has a linear length of 3.4nm, 0.26 microns corre- 
sponded to 1 kb of DNA. 

Plasmids, primers, siRNAs, and CRISPR sgRNAs. Death domain (DD)- 
Oestrogen receptor (ER)-mCherry-TRF1-FokI and Flag~TRF1-Fokl constructs 
were cloned as previously described’. Doxycycline-inducible TRF1-FokI lines were 
generated using the Tet-On 3G system. Briefly, Flag-DD-ER-mCherry-TRF1- 
FokI was cloned into the pLenti CMV TRE3G Puro Dest vector, which was intro- 
duced into cells engineered to co-express the reverse tetracycline transactivator 
3G (rtTA3G). N-terminal GFP-tagged proteins were generated by PCR amplifica- 
tion and ligation of cDNAs from the ProQuest HeLa cDNA Library (Invitrogen) 
into the pDEST53 (Invitrogen) mammalian expression vector. CRISPR lines were 
generated using a two-vector system (pLentiCas9-Blast and pLentiGuide-Puro). 
POLD3 reconstitution vector was generated by cloning POLD3 cDNA (RefSeq 
NM_006591.2) into the pOZ-N-Flag—HA retroviral vector followed by site- 
directed mutagenesis of siRNA binding sites. Sanger sequencing of POLD3 
CRISPR clones was performed on gDNA fragments cloned into a TOPO TA vector 
(Thermo Fisher). 

Transient plasmid transfections were carried out with LipoD293 (Signagen), 
and siRNA transfections with Lipofectamine RNAiMax (Invitrogen) according 
to manufacturer’s instructions. Analyses were performed 16h after transfection 
of plasmids, and 72h after siRNA transfection. All siRNAs were used at a final 
concentration of 20nM. 

The following primers were used for RT-PCR: 

POLD3 primer set 1: 5’/-GAGTTCGTCACGGACCAAAAC-3’, 5’-GCCA 
GACACCAAGTAGGTAAC-3’; 

POLD3 primer set 2: 5’-ACCAACAAGGAAACGAAAACAGA-3’, 5/-GG 
TTCCGTGACAGACACTGTA-3’; 

The following siRNA sequences were used: 

Control siRNA (siCtrl): QIAGEN AllStars Negative Control siRNA; 

ATR siRNA (siATR): 5‘-AACCUCCGUGAUGUUGCUUGAdTdT-3’; 

ATM siRNA (siATM): 5’-GCGCCUGAUUCGAGAUCCUdTdT-3’; 

RADS1 siRNA (siRAD51): #1, 5‘-UGUAGCAUAUGCUCGAGCG-3’, #2, 5/-CCA 
GAUCUGUCAUACGCUA-3’; 

HOP2 siRNA (siHOP2): #2, 5/-AAGAGAAGAUGUACGGCAA-3’, #3, 5/-UCU 
GCUUAAAGGUGAAAGUAGCAGG-3’; 

BRCA2 siRNA (siBRCA2): 5’/-GAAGAAUGCAGGUUUAAU-3’; 

RFCI siRNA (siRFC1): 5’/-GAAGGCGGCCUCUAAAUCAUU-3’; 

RAD17 siRNA (siRAD17): 5/-CAGACUGGUUGACCCAUCUU-3’; 

PCNA siRNA (siPCNA): 5’/-GGAGGAAGCUGUUACCAUAUU-3’; 

MRE11 siRNA (siMRE11): Dharmacon SMARTpool M-009271-01-0005; 

POLD3 siRNA (siPOLD3): #1, Invitrogen 4390824-s21045, #2, Invitrogen 
4392420-s21046; 

POLD1 siRNA (siPOLD1): #1, Invitrogen 4392420-s615, #2, Invitrogen 
4392420-s616; 

POLD4 siRNA (siPOLD4): 5‘-GCAUCUCUAUCCCCUAUGAUU-3’; 

POLE siRNA (siPOLE): #1, 5/-GGACAGGCGUUACGAGUUCUU-3'; #2, 5/-CU 
CGGAAGCUGGAAGAUUAUU-3’; 

POLAI1 siRNA (siPOLA1): #1, Invitrogen 4392420-s10772, #2, Invitrogen 
4392420-s10774; 

REV3L siRNA (siREV3L): 5’-CCCACUGGAAUUAAUGCACAAUU-3'; 

PRIM1 siRNA (siPRIM1): Invitrogen HSS108448; 

MCM2 siRNA (siMCM2): Invitrogen HSS106390; 

MCM7 siRNA (siMCM7): Invitrogen HSS106405; 

POLH siRNA (siPOLH): 5’-CTGGTTGTGAGCATTCGTGTA-3’; 

REV1 siRNA (siREV1): 5’-ATCGGTGGAATCGGTTTGGAA-3’; 

Knockdown efficiencies were evaluated by western blot (Extended data Fig. 9). 

The following CRISPR sgRNA sequences were used: 

sgPOLD3: #1, 5’-GCAGATAAAGCTGGTCCGCCA-3’, #2, 5/-GAAATA 
TAGACGAGTTCGTCA-3’; 

sgHOP2: #1, 5‘-GCCGGACGTTGTAGTTGCTCG-3’, #2, 5’-GCGGGAAA 
GGCGATGAGTAA-3’, #3, 5’-GCGGGAGGTAACGGCGCCGT-3’, #4, 5’-GAGT 
AGATTCACCCGTTGTC-3’, #5, 5’/-GACCCATGAGAGCCCGACAAC-3’. 
Antibodies. The following antibodies were used: anti-BrdU (mouse B44, BD 
347580; rat BU1/75, AbD Serotec OBT0030G), anti-ATRX (rabbit H-300, Santa 
Cruz sc-15408), anti-53BP1 (rabbit, Novus NB100-904), anti-yH2AX (mouse 
JBW301, Millipore 05-636), anti-Flag (mouse M2, Sigma F1804), anti-PML 
(mouse PG-M3, Santa Cruz sc-966), anti-Rad51 (rabbit H-92, Santa Cruz sc-8349; 
mouse 14B4, Abcam ab-213), anti- Hop2/PSMC3IP (rabbit, Novus NBP1-92301), 
anti-POLD3 (mouse 3E2, Abnova H00010714-M01), anti-POLD1 (mouse 
607, Abcam ab10362; rabbit, Bethyl A304-005A), anti-POLD2 (rabbit, Bethyl 
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A304-322A), anti-POLD4 (mouse 2B11, Abnova H00057804-M014A), anti-POLE 
(mouse 93H3A, Pierce MA5-13616; rabbit, Novus NBP1-68470), anti-POLE3 (rab- 
bit, Bethyl A301-245A), anti-POLA1 (rabbit, Bethyl A302-851A), anti-MCM7 
(rabbit, Bethyl A302-584A), anti- MCM4 (rabbit, Bethyl A300-193A), anti- MCM5 
(rabbit, Abcam ab75975), anti-RFC1 (rabbit, Bethyl A300-320A), anti-PCNA 
(mouse PC10, CST #2586) anti-ATR (goat N-17, Santa Cruz sc-1887), anti-PRIM1 
(rabbit H300, Santa Cruz sc-366482), anti-Rad17 (goat, Bethyl A300-151A), anti- 
REV3L (rabbit, GeneTex GTX100153), anti- POLH (rabbit, Bethyl A301-231A), 
anti-REV1 (rabbit H300, Santa Cruz sc-48806) anti-GAPDH (rabbit 14c10, CST 
#2118), anti-aTubulin (mouse TU-02, Santa Cruz sc-8035). 

Drugs. Doxycycline was used at a concentration of 40 ng ml! for 16-24h to induce 
expression of TRF1-FokI. Shield-1 (Cheminpharma LLC) and 4-hydroxytamoxifen 
(4-OHT) (Sigma-Aldrich) were both used at a concentration of 11M for 2h, unless 
otherwise stated, in to allow for TRF1-FokI stabilization and translocation into 
the nucleus. RO-3306 (Selleck Chemicals) was used at a concentration of 10,.M 
for 20-24h. G2 enrichment was confirmed by propidium iodide staining and 
flow cytometry. Colcemid (Roche) was used at a concentration of 100ng ml~!. 
The ATR inhibitor VE-821 (Selleck Chemicals) and Chk1 inhibitor LY2603618 
(Selleck Chemicals) were used at a concentration of 541M and 1|.M respectively 
for 24h. 

Western blot. Cells were lysed in RIPA buffer supplemented with cOmplete protein 
inhibitor cocktail (Roche) and Halt phosphatase inhibitor cocktail (Thermo) on ice 
and subsequently spun down at max speed at 4°C. The supernatant was removed and 
protein concentration determined using the Protein Assay Dye Reagent (Bio-Rad). 
20-40 1g of protein was run on a 4~12% Bis—Tris gel (Invitrogen). Proteins were 
transferred onto an Amersham Protran 0.2 1m nitrocellulose membrane (GE) and 
blocked with 5% milk. Membranes were incubated with primary antibodies over- 
night at 4°C. The next day membranes were incubated with secondary antibodies 
for 1h at room temperature and subsequently developed using Western Lightning 
Plus-ECL (Perkins Elmer) or SuperSignal West Femto (Thermo). 
Immunofluorescence, immunofluorescence-FISH, TIF assay, APB assay, 
and CO-FISH. Cells grown on coverslips were fixed in 4% paraformaldehyde 
for 10 min at room temperature. Coverslips were then permeabilized in 0.5% 
Triton X-100 for 5 min at 4°C (for most antibodies) or 100% cold methanol for 
10 min at —20°C (for anti-PCNA). Primary antibody incubation was performed 
at 4°C in a humidified chamber overnight unless otherwise indicated. Coverslips 
were washed and incubated with appropriate secondary antibody for 20 min at 
37°C, then mounted onto glass slides using Vectashield mounting medium with 
DAPI (Vector Labs). For immunofluorescence-FISH, coverslips were re-fixed in 
4% paraformaldehyde for 10 min at room temperature after secondary antibody 
binding. Coverslips were then dehydrated in an ethanol series (70%, 90%, 100%) 
and allowed to air dry. Dehydrated coverslips were denatured and incubated 
with TelC-Cy3 peptide nucleic acid (PNA) probe (Panagene) in hybridization 
buffer (70% deionized formamide, 10 mM Tris (pH 7.4), 0.5% Roche blocking 
solution) overnight at room temperature in a humidified chamber. The next day, 
coverslips were washed and mounted as described above. Images were acquired 
with a QImaging RETIGA-SRV camera connected to a Nikon Eclipse 80i micro- 
scope. For TIF assay, cells were scored for co-localized 53BP1 and telomere foci by 
immunofluorescence-FISH. For APB assay, cells were scored for the number of PML- 
telomere colocalizations by immunofluorescence-FISH. Hop2 immunofluores- 
cence and CO-FISH experiments were performed as previously described’. 
Pulsed-field gel electrophoresis and in-gel hybridization. Telomere gels were 
performed using telomere restriction fragment (TRF) analysis. Genomic DNA 
was digested using Alul and Mbol (NEB). 4-10j1g of DNA was run ona 1% 
PFGE agarose gel (Bio-Rad) in 0.5 x TBE buffer using the CHEF-DRII system 
(Bio-Rad) at 6 V cm”; initial switch time 5s, final switch time 5s, for 16h at 
14°C. The gel was then dried for 4h at 50°C, denatured in a 0.5 N NaOH 1.5M 
NaCl solution, and neutralized. Gel was hybridized with *’P-labelled (CCCTAA) 


oligonucleotides in Church buffer overnight at 42 °C. The next day, the membrane 
was washed four times in 4x SSC buffer, exposed onto a storage phosphor screen 
(GE Healthcare) and scanned using STORM 860 with ImageQuant (Molecular 
Dynamics). Telomere length was determined using TeloTool software*?. 

C-circle assay. C-circle assay was performed as previously described**. Genomic 
DNA was digested using Alul and Mbol (NEB). 30 ng of digested DNA was com- 
bined with 0.2 mg ml~! BSA, 0.1% Tween, 1 mM each dNTP without dCTP, 1 x 629 
Buffer (NEB) and 7.5 U 629 DNA polymerase (NEB). Samples were incubated 
for 8 h at 30°C followed by 20 min at 65°C. Samples were then diluted in 2x SSC 
buffer and dot-blotted onto an Amersham Hybond-N* nylon membrane (GE). 
Membrane was ultraviolet crosslinked and then hybridized with **P-labelled 
(CCCTAA), oligonucleotides in PerfectHyb Plus Hybridization Buffer (Sigma) 
overnight at 37 °C. The next day, the membrane was washed twice in 2x SSC 
buffer, exposed onto a storage phosphor screen (GE Healthcare) and scanned using 
STORM 860 with ImageQuant (Molecular Dynamics). 

Co-immunoprecition and chromatin immunoprecipitation (ChIP). Cells were 
lysed in HEPES immunoprecipitation buffer (10 mM HEPES (pH 8), 2mM EDTA, 
0.1% NP-40) supplemented with 5mM DTT, 1mM PMSF, and 1x cOmplete 
protein inhibitor cocktail (Roche) on ice and subsequently spun down at max speed 
at 4°C. The supernatant was removed and protein concentration determined using 
the Protein Assay Dye Reagent (Bio-Rad). 251g protein was removed for input. 
500 1g protein was diluted to 1mg ml”! in HEPES immunoprecipitation buffer 
and pre-cleared with 10 11 Protein G magnetic beads (Pierce) for 1 h rotating at 
4°C. Protein lysate was then incubated with 101g anti-IgG (Sigma) or anti-POLD1 
antibody (Abcam) rotating overnight at 4°C. The next day, samples were incu- 
bated with 3011 Protein G magnetic beads (Pierce) that had been pre-bound to a 
bridging antibody (Active Motif) for 1 h rotating at 4°C. Beads were subsequently 
washed five times with HEPES immunoprecipiation buffer. Proteins were eluted 
by incubating beads with 2 x sample buffer with BME for 5 min at 95°C. Samples 
were analysed by western blot. ChIP was performed as previously described and 
analysed by western blot and dot blot*®. 

Telomere content dot blot. 400 ng of genomic DNA was diluted into 2x SSC 
buffer, treated at 95°C for 5 min, and dot-blotted onto an Amersham Hybond-N* 
nylon membrane (GE). Membrane was then denatured in a 0.5 N NaOH 1.5M 
NaCl solution, neutralized, and UV crosslinked. Membrane was hybridized with 
2P_labelled (CCCTAA),, or Alu repeat oligonucleotides in PerfectHyb Plus 
Hybridization Buffer (Sigma) overnight at 37°C. The next day, the membrane was 
washed twice in 2 SSC, exposed onto a storage phosphor screen (GE Healthcare) 
and scanned using STORM 860 with ImageQuant (Molecular Dynamics). 

Live cell imaging and image analysis. Live cell imaging was performed and ana- 
lysed as previously described’. Fixed cell and live cell images were captured at 
60x and 100 magnification, respectively. Microscope images and dot blots were 
prepared and analysed using Fiji. Southern blot telomere gel images were prepared 
using Fiji and were not cropped to exclude any part of the presented lanes. Western 
blot gel images were prepared using Adobe Photoshop and cropped to present 
relevant bands. Uncropped western blot images are shown in Supplementary Fig. 1. 
Statistics. All statistical analysis was done using GraphPad Prism 5 software. 
Unpaired t-tests were used to generate two-tailed P values. 
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Extended Data Figure 1 | An inducible system for studying break- 
induced telomere synthesis. a, Schematic of inducible TRF1-FokI 
system. b-d, Characterization of U2OS inducible TRF1-FokI system by 
western blot (b), immunofluorescence (c), and telomere ChIP (d). 

e, Agarose gel of sonicated DNA prepared for BrdU pulldown. 

f, g, BrdU pulldown dot blot for telomere content (f) from asynchronous 


Sonicated 


+RO-3306 


or G2-enriched U20S cells induced (Ind) with TRF1-FokI for 2h, 
with cell-cycle profiles by propidium iodide staining (g). Images 

were captured at 60x magnification. Dox, doxycycline; S, Shield-1; 

T, 4-hydroxytamoxifen; DD, destabilization domain; ER, oestrogen 
receptor; rtTA, reverse tetracycline transactivator; TRE3G, tetracycline 
response element; WT, wild-type; D450A, nuclease-null mutant. 
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Extended Data Figure 2 | Visualization of spontaneous ALT telomere immunofluorescence of metaphases from spontaneous GM847 cells (d) 
synthesis. a~c, BrdU immunofluorescence assay to visualize spontaneous and U20S induced (+Ind) with TRF1-FoklI for 2h upon release from 
ALT telomere synthesis, with representative images of VA13 cells (a) and RO-3306 (e). Images were captured at 60x magnification. Data represent 
quantification of a panel of ALT” and ALT* cell lines (b) and U2OS cells mean + s.e.m. of three independent experiments. ***P < 0.001. 


induced with TRF1-FokI for 2h (c). d, e, Representative images of BrdU 
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Extended Data Figure 3 | Break-induced telomere synthesis occurs 
independently of telomere maintenance mechanism. a, A panel of ALT~ 
and ALT* inducible TRF1-FokI cell lines tested for TRF1-Fok1 

and ATRX expression by western blot and nascent telomere synthesis by 
BrdU pulldown dot blot for telomere content after induction (Ind) with 


TRF1-FoklI for 2h. b, c, BrdU pulldown dot blot for telomere content 
(b) from HeLa 1.3 cells induced (Ind) with TRF1-FokI for 2h, with 
quantification (c). Data represent mean +s.e.m. of two independent 
experiments. *P<0.05. 
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Extended Data Figure 4 | Hop2 contributes to telomere clustering but 
is dispensable for telomere length maintenance. a~h, CRISPR/Cas9- 
mediated excision of HOP2 (sgHOP2) in VA13 cells, with western blot of 
populations (a). Analysis of Hop2 co-localization with telomere foci by 


IF-FISH (b), telomere focus size by FISH (c), APBs by PML co-localization 


with telomere foci (d, e), and telomere exchanges by CO-FISH (f) from 


sgHOP2 #2 population. Analysis of clones (cl-c6) by western blot (g) 
and TRF pulsed-field gel at ~PD 25 (h). Peak intensity of telomere length 
is indicated by red dot. Images were captured at 60x magnification. 

Data represent mean + s.e.m. of at least two independent experiments. 
#** P< 0.0005, **P < 0.005, *P < 0.05. 
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Extended Data Figure 5 | Requirements for break-induced and TRF1-FoklI for 2h and treated with indicated siRNAs, with quantification. 
spontaneous ALT telomere synthesis. a, BrdU pulldown dot blot e-i, Analysis of spontaneous ALT telomere synthesis using BrdU 
timecourse for telomere content from U2OS induced (Ind) with TRF1- immunofluorescence from VA13 (e-h) and GM847 and LM216] (i) treated 
Fokl for indicated times and treated with indicated siRNAs. b, c, BrdU with indicated siRNAs. Images were captured at 60x magnification. 
pulldown dot blots for telomere content from U2OS induced (Ind) with Data represent mean + s.e.m. of two (d) or three (e-i) independent 


TRF1-Fokl for 2h and treated with indicated siRNAs. d, BrdU pulldown experiments. ****P< 0.0001, **P< 0.01, *P<0.05. 
dot blot for telomere content from HeLa 1.3 induced (Ind) with 
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Pol 6 complex from cell lines treated with TRF1-Fokl. Asterisk denotes mutant. Images were captured at 60x magnification. Data represent 
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Extended Data Figure 7 | POLD3 is critical for telomere maintenance 
in ALT-dependent cells. a, b, analysis of transient POLD3 depletion by 
C-circle dot blot (a) from U2OS and CO-FISH (b) from VA13 (n= 1780 
ends for siCtrl, n = 1637 ends for siPOLD3). c, TRF analysis from U20S 
populations at ~PD 25. Peak intensity of telomere length is indicated 

by red dot. d, schematic of U20S POLD3 CRISPR (sgPOLD3) cloning 
strategy with western blot. e-g, analysis of POLD3 expression from U2O0S 
clones cl-c4 by qPCR (e), POLD1 Co-IP (f), and darker exposure of 


64 —FLAG-HA-POLD3 
1 Po. 


GAPDH 


western blot from Fig. 4a (g). Asterisk denotes non-specific band. 

h, Quantification of relative telomere content by dot blot from U20S 
clones cl-c4. i, Heat map summarizing decreases (blue), increases (red), 
or no change (white) in telomere maintenance from U2OS clones cl-c4 
as compared to U20S control. j, k, POLD3-reconstituted CRISPR clones 
analysed for C-circles by dot blot (j) and Pol 6 expression by western 
blot (k). EV, empty vector; WT, reconstituted POLD3. Data represent 
mean + s.e.m. of two independent experiments. **P < 0.01, *P < 0.05. 
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Extended Data Figure 8 | Extended analysis of POLD3 CRISPR clones. 
a, b, TRF analysis by pulsed-field gel (a) and C-circle dot blot (b) from 
U20S POLD3 CRISPR (sgPOLD3) clones with normal POLD3 expression 
(c5-c9) at ~PD 25. c, U2OS POLD3 CRISPR clones from an independent 
guide RNA (sgPOLD3 #2) analysed by TRF and western blot at ~PD 25. 


d, TRF analysis by pulsed-field gel from HeLa 1.3 populations at ~PD 25. 
e, Schematic of HeLa 1.3 POLD3 CRISPR cloning strategy with western 
blot. f, TRF analysis by pulsed-field gel from HeLa 1.3 clones cl-cl1 at 
~PD 25. Peak intensity of telomere length is indicated by red dot. 
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Extended Data Figure 9 | Knockdown efficiencies. a~n, Western blots of U2OS or VA13 cells treated with indicated siRNAs. Asterisk denotes 
non-specific band. 


© 2016 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


ARTICLE 


doi:10.1038/nature20124 


Defining synonymous codon compression 
schemes by genome recoding 


Kaihang Wang!, Julius Fredens!*, Simon F. Brunner!*, Samuel H. Kim!, Tiongsun Chia! & Jason W. Chin!? 


Synthetic recoding of genomes, to remove targeted sense codons, may facilitate the encoded cellular synthesis of 
unnatural polymers by orthogonal translation systems. However, our limited understanding of allowed synonymous 
codon substitutions, and the absence of methods that enable the stepwise replacement of the Escherichia coli genome with 
long synthetic DNA and provide feedback on allowed and disallowed design features in synthetic genomes, have restricted 
progress towards this goal. Here we endow E. coli with a system for efficient, programmable replacement of genomic DNA 
with long (>100-kb) synthetic DNA, through the in vivo excision of double-stranded DNA from an episomal replicon by 
CRISPR/Cas9, coupled to lambda-red-mediated recombination and simultaneous positive and negative selection. We 
iterate the approach, providing a basis for stepwise whole-genome replacement. We attempt systematic recoding in an 
essential operon using eight synonymous recoding schemes. Each scheme systematically replaces target codons with 
defined synonyms and is compatible with codon reassignment. Our results define allowed and disallowed synonymous 
recoding schemes, and enable the identification and repair of recoding at idiosyncratic positions in the genome. 


The design and synthesis of genomes provides a powerful approach 
for understanding and engineering biology’ ®. Genome synthesis has 
the potential to elucidate synonymous codon function’, accelerate 
metabolic engineering®, and facilitate genetically encoded unnatural 
polymer synthesis”!°. 

Methods that replace the genome in sections’, provide feedback on 
precisely where a given design fails and on how to repair it, and can 
be rapidly iterated for whole-genome replacement, would increase 
our ability to understand and manipulate the information encoded 
in genomes. 

In E. coli, the workhorse of synthetic biology, progress on replac- 
ing large sections of the genome has been slower than in naturally 
recombinogenic organisms®''. Sequence-specific recombinases may 
be introduced into E. coli to direct recombination at defined target 
sequences that must be introduced into the genome in advance, and 
these approaches cannot been iterated'*. Lambda-red-mediated homol- 
ogous recombination", using linear double-stranded (ds)DNA that 
is electroporated in to cells, can be programmed to target any region 
of the genome via short (50-bp) homology regions at either end of 
a linear dsDNA (referred to herein as HR1 and HR2). However, this 
approach is commonly limited to inserting or replacing only 2-3 kb 
of genomic DNA, and has not been used to introduce long sequences 
into the genome. 

Weare interested in reprogramming the genetic code for the in vivo 
biosynthesis of unnatural polymers”. Reassigning particular codons in 
the genome to synonymous codons would enable removal of their cog- 
nate transfer RNAs, compression of the number of synonymous codons 
used to encode certain natural amino acids, and the reassignment 
of certain sense codons, and an expanded set of quadruplet codons 
to evolved orthogonal translation systems for unnatural polymer 
synthesis'*!°, However, recoding the E. coli genome requires both the 
development of methods for efficiently replacing genomic DNA with 
synthetic DNA and an understanding of the best synonymous codon 
substitutions, from many possible choices, for recoding. 

Nature chooses one triplet codon from up to six potential synonyms 
to encode each amino acid at each position in the genome; this choice 


y'6 ]” 


can define transcriptional’® or translational’’ regulatory elements, 
translation speed!®)9, mRNA folding’, gene expression, co-translational 
folding”°”! and protein production levels’, and is likely to have further 
undiscovered roles. Synonymous codons may have distinct roles at 
different sites in the genome, and there may be epistatic interactions 
amongst codons within and between genes””-**, Our limited under- 
standing of the factors driving codon choice suggests that the best 
synonymous codon substitutions to implement for synthetic recoding 
should be identified empirically. 

Here we endow E. coli with a system that enables efficient, program- 
mable, one-step introduction of long synthetic DNA into the genome, 
as insertions or replacements, and iterate the approach for stepwise 
replacement of longer genomic regions. Using our approach we inves- 
tigate different synonymous recoding schemes for replacing the same 
target codons with distinct sets of synonyms, in an operon rich in both 
target codons and essential genes, providing insight into allowed and 
disallowed schemes for genome recoding and synonymous codon 
compression. 


Inserting DNA into the genome by REXER 

The overall efficiency of lambda-red-mediated recombination proto- 
cols is the product of the transformation efficiency for linear dsDNA 
and the efficiency with which the linear dsDNA mediates intracel- 
lular recombination. The overall efficiency decreases markedly with 
increasing dsDNA length, and we hypothesized that this decrease 
results primarily from challenges in efficiently delivering intact dsDNA 
into cells. To address this challenge, we envisioned introducing the 
DNA of interest into E. coli in an episomal replicon, and excising the 
dsDNA of interest to facilitate lambda-red-mediated recombination. 
To select for the correct integrants we envisioned the use of simulta- 
neous positive and negative selection, to select for the integration of a 
positive selection marker from the replicon into the genome and the 
loss of a negative selection marker from the genomic locus targeted for 
replacement; such a double selection strategy substantially enhances 
integration at the target locus by lambda-red-mediated recombination 
(Extended Data Fig. 1). 


1Medical Research Council Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge CB2 OQH, UK. @Department of Chemistry, Cambridge University, Cambridge CB2 1EW, UK. 
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Figure 1 | Efficient, programmable insertion of very long synthetic 
DNA (s. DNA) into the genome of E. coli. a, REXER 2 and REXER 

4. CRISPR protospacer sequences are blue and orange rectangles, 
respectively. Triangles indicate spacer RNAs that program cleavage 
within colour-matched protospacers. REXER 4 augments REXER 2 by 
adding two extra protospacers (purple and red rectangles) and triggering 
cleavage with four spacer RNAs. +1 is Kan®, —1 is rpsL, +2 is Cm*, 

—2 is sacB. b, REXER 2 and REXER 4 are dependent on the CRISPR/Cas9 
system and recombination. For experiments that omit CRISPR/Cas9 

or the recombination machinery, the efficiency is significantly reduced 
(P< 10%, two tailed t-test, for the null hypothesis that the efficiency is 
independent of CRISPR/Cas9 or lambda red for REXER 2 and REXER 4). 
Controls omit either spacer RNA or lambda red beta. Data show 

mean +s.d. (n= 4, two biological replicates with two technical replicates 
of each for experiments that omit lambda red alpha/beta (Red a/8) 

or CRISPR/Cas9; n = 6, three biological replicates with two technical 
replicates of each for other experiments). c. The efficiency of REXER 2 and 
REXER 4 is constant for insertions between 2 kb and 90 kb. CFU, colony 
forming units. The data show the mean +s.d. (n= 6, three biological 
replicates with two technical replicates of each for 2-kb insertion; n = 4, 
two biological replicates with two technical replicates of each for 9-kb 
and 90-kb insertions). It was not possible to obtain a 90-kb linear dsDNA 
product in vitro for classical lambda red recombination, and our data 
reflect this, rather than the efficiency of recombination per se. It is well 
established that lambda red recombination efficiency falls off rapidly with 
linear dsDNA length. 


We created E. coli MDS42'P-K88/'K (a genome-minimized strain of 
E. coli”> in which the genomic copy of rpsL contains a K43R muta- 
tion conferring resistance to streptomycin, and the —1/+1 selection 
cassette encoding an rpsL-Kan* fusion inserted between cra and 
mraZ) containing a bacterial artificial chromosome (BAC) in which 
the —2/+2 cassette (encoding sacB-Cm®) is flanked by the HRI and 
HR2 sequences and Cas9 target sites (containing protospacer-PAM 
(protospacer adjacent motif) sequences) and expressing lambda red 
(alpha/beta/gamma), Cas9*9, and tracrRNA” (Fig. 1a). The addi- 
tion of a plasmid encoding spacer RNAs targeting the protospacers”° 
within the BAC target sites to these cells, and selection for the gain of 
resistance to both chloramphenicol (gain of +2) and streptomycin (loss 
of —1 from the genome, and loss of the backbone of the BAC) led to 
replacement of the sequence between HR1 and HR2 in the genome, 
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with the sequence between HR1 and HR2 from the BAC (Fig. 1a and 
Extended Data Fig. 2). 

Genomic replacement was strictly dependent on CRISPR/Cas9 
and the lambda red recombination machinery (Fig. 1b), and targeted 
to the desired genomic locus (Extended Data Fig. 1c, d); consistent 
with the CRISPR/Cas9-mediated excision of the dsDNA between 
HRI and HR2 in the BAC and lambda-red-mediated integration of 
this sequence between HRI and HR2 in the genome. We named our 
approach REXER 2 (replicon excision for enhanced genome engineer- 
ing through programmed recombination; ‘2’ indicates the number of 
CRISPR/Cas9 cuts). 

To investigate the dependence of REXER 2 on the length of DNA 
inserted into the genome, we created BACs with 9 kb or 90 kb of 
DNA inserted between HR1 and —2/+2 (Fig. 1c and Extended Data 
Fig. 2). The insertions contain a luxABCDE operon”’, which is 
sufficient to generate bioluminescence in E. coli. We transformed 
each BAC into E. coli MDS42'P“K8/'K and implemented the REXER 2 
protocol. All cells selected on chloramphenicol and streptomycin 
integrated the lux operon at the correct locus and were bioluminescent 
(Extended Data Fig. 2). Moreover, whereas the efficiency of classical 
recombination drops markedly from 10* colony forming units (CFU) 
for a 2-kb insertion to approximately 10 CFU for a 9-kb insertion, 
the efficiency of REXER 2 is constant, at 10* CFU for all insertions 
(Fig. 1c). 

Next we investigated whether the efficiency of REXER could be 
improved by making two additional CRISPR/Cas9-mediated double- 
strand breaks in the genome between HRI and HR2 (Fig. 1a). The 
resulting REXER 4 protocol led to replacement of the sequence between 
HRI and HR2 in the genome with the sequence between HRI and HR2 
from the BAC (Extended Data Fig. 2) and destruction of all four Cas9 
target sites. REXER 4 was strictly dependent on both the CRISPR/Cas9 
system and the lambda red recombination machinery (Fig. 1b), and led 
to integration at the correct locus (Extended Data Fig. 2). Like REXER 2, 
the efficiency of REXER 4 was independent of length for insertions 
tested (up to 90 kb), and REXER 4 further increased the efficiency of 
synthetic DNA insertion with respect to REXER 2 by 10%-fold while 
maintaining insertion at the correct locus (Fig. 1c and Extended Data 
Fig. 1c, d). 


Replacing genome sections by REXER 

Next, we demonstrated that REXER could be used to efficiently 
replace 100 kb of the E. coli genome ina single step. We targeted the 
region from mraZ to pyrH for replacement and inserted the —1/+1 
selection cassette at the 5’ end of this region. We assembled a BAC 
from DNA fragments in Saccharomyces cerevisiae*® in which the 
100-kb region between mraZ and pyrH was watermarked along its 
length by genes from the Jux operon (Extended Data Fig. 3). REXER 2 
yielded 2 x 10* CFU per reaction, of which 80% were bioluminescent, 
and REXER 4 yielded 5 x 10° CFU per reaction, of which 50% were 
bioluminescent (Extended Data Fig. 3). Further characterization 
confirmed the integration of the Jux watermarks at the expected loci 
for all bioluminescent colonies (Extended Data Fig. 3). These results 
demonstrate that REXER enables the replacement of genomic regions 
with synthetic DNA. 


Iterative REXER for genome replacement 

Iteratively replacing large sections of the genome with synthetic 
DNA via REXER will enable genome stepwise interchange synthesis 
(GENESIS) for whole-genome replacement (Fig. 2a). Towards this 
goal, we demonstrated iterative REXER (Extended Data Fig. 4). The 
genome created in a first round of REXER, which introduces —2/+2, 
provides a direct template for a second round of REXER, using a 
BAC that contains distinct positive and negative selection markers 
(—1/+1) (Extended Data Fig. 4). This product is a template for a 
third round of REXER; thus, the approach can be iterated (Extended 
Data Fig. 4). 
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To explicitly demonstrate the stepwise replacement of long regions 
of genomic DNA with synthetic DNA for GENESIS, we used cells in 
which we replaced the 100-kb genomic region between mraZ and 
pyrH by REXER (Extended Data Fig. 3 and Fig. 2b) for a second step 
of REXER. This second step introduced 124kb of DNA spanning frr 
to mhpT and the hygromycin B phosphotransferase gene (hph). We 
confirmed the replacement of 220 kb of the genome with 230 kb of 
synthetic DNA in two steps (Fig. 2c, d). This compares favourably with 


the largest replacement in the naturally recombinogenic S. cerevisiae 
(270 kb, 11 steps)®. 


Testing synonymous recoding schemes 

We used REXER to define synonymous substitutions that are disal- 
lowed and poorly tolerated and synonymous substitutions that are 
allowed and can be implemented at many positions in the genome. 
To define a system for experimental investigation we identified: 
i) the codons to target for removal; ii) the codons with which the target 
codons might be replaced to define recoding rules; and iii) a region of 
the genome in which to test recoding rules. 

We chose target codons that, when removed from the genome, would 
enable the removal of all the tRNAs that decode them, and where 
removal of these tRNAs would not remove all decoding of the remain- 
ing synonymous codons in the genome; these are the minimum criteria 
for removing a sense codon from the genome to enable its unambig- 
uous reassignment (Extended Data Fig. 5). We focused on removing 
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Figure 2 | Iterating REXER for GENESIS. 

a, Iterative genomic replacement by REXER 

will enable genome replacement in a series of 
steps. b, Iterative REXER replaces 220 kb of the 
E. coli genome with 230 kb of synthetic DNA in 
two steps. LuxA, B, C, D and E (cyan rectangles) 
are necessary and sufficient for luminescence. 
hph (violet rectangle) is the hygromycin B 
phosphotransferase gene, which confers 
resistance to hygromycin B. c, Cells phenotype 
correctly through rounds of REXER. genome", 
parental cell line; clones A and B, independent 
clones from the first round of REXER; clones 

C and D, independent clones from the second 
round of REXER; Lumi, luminescence; Cm, 
chloramphenicol; Kan, kanamycin; Suc, sucrose; 
Strep, streptomycin. d, Cells genotype correctly 
through rounds of REXER. For gel source images, 
see Supplementary Fig. 1. 
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serine, leucine and alanine codons that fulfill these criteria, as these are 
the three codon sets for which the aminoacyl-tRNA synthetases do not 
recognize the anticodons of their cognate tRNAs”. This will facilitate 
reassignment of the target codon (or up to four quadruplet derivatives), 
following deletion of host tRNAs, to orthogonal synthetase-tRNA pairs 
in orthogonal translation'*. 

We defined candidate synonymous replacements for the target 
codons by identifying the closest match for the target codons, as 
judged by either codon adaptation index (cAi)*°, tRNA adaptation 
index (tAi)*!"?, or a third metric we define (translation efficiency, tE) 
(Extended Data Table 1 and Methods). These considerations led us to 
investigate eight recoding schemes (Fig. 3a). 

We identified the E. coli cell division operon as an ideal target in which 
to test these synonymous recoding schemes because it i) is rich in essential 
genes (12 out of 15 genes in the region are essential)**, ii) contains 
genes expressed at a range of levels™, iii) includes genes encoding 
membrane proteins* (a class of proteins for which co-translational 
folding is known to be affected by synonymous codon choice), 
iv) includes several proteins that interact and for which the ratios of 
proteins expressed are distinct and crucial**° (which will favour inter- 
genic epistatic interactions amongst codons), and v) is rich in the target 
codons (Fig. 3b and Extended Data Table 2). We anticipated that these 
features would ensure that the region captures important properties of 
the genome, and that the success or failure of synonymous recoding in 
the region would be reflected in viability and growth or a lack thereof. 
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Figure 3 | Systematic and defined synonymous recoding in an E. coli 
operon rich in essential genes. a, Identifying codon targets for removal 
(grey) and the synonyms to which they are recoded (pink) in each 
recoding scheme (rs.). Lines indicate codon-anticodon interactions. 
Replacements were chosen by cAi, tAi, or tE. Application of each recoding 
scheme genome-wide would allow the targeted codons to be completely 
removed from the E. coli genome and, following deletion of the cognate 
tRNA genes, codon reassignment to orthogonal translation systems 

for unnatural polymer synthesis. b, Identifying a target operon rich in 
target codons and essential genes to test recoding schemes. The top panel 
indicates the positions of essential genes. In the bottom three panels, the 

y axis scores the number of the indicated target codons in essential genes at 
the genomic position indicated on the x axis. The mraZ to ftsZ region (red) 
was identified in the highest-scoring 20-kb region across the E. coli MDS42 
genome for all targeted codons. c, Position and density of targeted codons 
in the mraZ to ftsZ region. The positions of targeted codons (the indicated 
sense codons plus TAG to TAA) are coloured in red. Pink regions with 

red outlines indicate duplicated regions, which refactor” overlapping open 
reading frames to enable independent recoding of the downstream open 
reading frames. 


Allowed and disallowed recoding schemes 

We designed DNA sequences in which each of the recoding schemes 
was implemented within all of the fifteen genes simultaneously. Overall, 
the schemes investigate the consequences of 1,468 codon changes 
and 2,347 nucleotide changes (Fig. 3c). The DNA for each scheme 
was synthesized de novo and assembled into a BAC in S. cerevisiae, 
and genomic recoding via REXER was investigated (Extended Data 
Figs 6, 7). 
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serine recoding scheme 1 with ftsA 407 AGT changed to AGC (as in serine 
recoding schemes 2 and 3) is plotted in orange. This mutation repairs 

the deleterious effect of ftsA 407 AGT without reintroducing the codons 
targeted for removal. 


Following REXER, we sequenced 16 independent clones from each 
recoding scheme. We observed chimaeras between the wild-type 
genomic DNA and the recoded DNA in several cases, consistent with 
recombination-mediated crossover between the recoded sequence and 
the starting genome; these chimaeras defined a recoding landscape. 
We aligned the individual recoding landscapes to create a ‘compiled 
recoding landscape’ (Extended Data Figs 6, 7) that reveals peaks and 
plateaus for synonymous substitutions that are allowed and valleys or 
troughs for synonymous substitutions that are consistently disallowed. 
We observe clear differences in the extent to which replacement of 
the same target codons by different synonymous codons are tolerated 
(Fig. 4a—c and Extended Data Figs 6, 7). 

We first investigated the serine recoding schemes 1-3 (Fig. 4a). For 
scheme 1, 0% of clones were completely recoded, but for scheme 2 and 
scheme 3, 88% of clones were fully recoded. By contrast, none of the 
leucine recoding schemes tested (schemes 4-6) led to complete recod- 
ing, and for schemes 4 and 5 recoding failed catastrophically, indicating 
that the synonymous substitutions had phenotypic consequences at 
many sites in the operon (Fig. 4b). Finally, the two alanine recoding 
schemes tested (schemes 7 and 8) had markedly different outcomes 
(Fig. 4c). Recoding scheme 7 led to 75% of clones being completely 
recoded at all 374 positions, whereas no clones were fully recoded by 
scheme 8. The doubling times for all fully recoded clones were compa- 
rable to each other and to a control E. coli strain (Extended Data Fig. 8). 
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Overall, this work successfully removed up to 373 sense codons across 
20 kb from an operon rich in essential genes in a single strain. Thus, 
the scale of sense codon removal was much greater than in previously 
reported work that investigated one gene at a time’”. 

Our data reveal the marked differences between precisely defined 
recoding schemes that are obscured when the choice of synonymous 
substitution is varied*’. For serine recoding, recoding by schemes 2 and 
3 is allowed, while scheme 1 recoding is not, even though the codons 
used for replacement in scheme 1 differ from those used in schemes 2 
and 3 by only a single base (AGT vs AGC), and are decoded by the same 
tRNA (with anticodon GCT) via wobble and Watson-Crick decoding, 
respectively (Fig. 3a). Similarly, for alanine codons, scheme 7 recoding 
is allowed while scheme 8 recoding fails catastrophically. These recod- 
ing schemes differ only in the conversion of a single base (GCT vs 
GCC) in the allowed and disallowed substitution for GCA. Again, both 
of the new codons are decoded by the same set of tRNAs (Fig. 3a). cAI, 
tAi and tE all produce at least one successful recoding, but no single 
metric predicts which synonymous recoding will be successful. These 
observations underscore the importance of empirically determining 
the best systematic and well-defined synonymous recoding scheme 
for each codon. 

E. coli consistently rejects a single-codon mutation (TCG to AGT) at 
codon 407 of ftsA in recoding scheme 1 (Fig. 4a). Attempts to introduce 
the ftsA 407 TCG-to-AGT mutation (without additional recoding at 
other positions in the genome) failed (Extended Data Fig. 8). By contrast, 
we were able to quantitatively recode ftsA 407 TCG to the synonymous 
TCT codon (Extended Data Fig. 8). These results demonstrate that the 
ftsA 407 TCG-to-AGT mutation is deleterious. 

Mutation of the codon at position 407 in ftsA from AGT to AGC (the 
codon found at this position in recoding schemes 2 and 3) is sufficient 
to repair recoding scheme 1 (Fig. 4d and Extended Data Figs 7, 8). This 
mutation markedly alters REXER-mediated recoding, increasing the 
fraction of fully recoded sequences from 0% to 94% and the fraction of 
recoding at codon 407 of ftsA from 0% to 100% (Extended Data Fig. 8). 
We also successfully introduced this mutation into recoding scheme 
1 by combining single-stranded DNA recombineering with REXER 
(Extended Data Fig. 8). The growth of E. coli was not affected by the 
successful recoding schemes (Extended Data Fig. 8). These results 
demonstrate that the major defect in recoding scheme 1 results from 
AGT being disallowed at position 407 of ftsA, though it is tolerated 
at many other positions. Since TCG, TCT and AGC are allowed at 
position 407 of ftsA but AGT (which shares nucleotides with allowed 
codons at each position of the triplet) is disallowed, we conclude that 
the problem at this codon lies in the entire triplet. These experiments 
exemplify how REXER may be used to identify idiosyncratic positions in 
the genome that are refractory to recoding by otherwise well-tolerated 
recoding schemes, and repair the recoding scheme by the introduction 
of alternative codons at these idiosyncratic positions. 


Discussion 
We have generated an efficient approach to enable both the pro- 
grammed insertion of large synthetic DNA sequences into the E. coli 
genome and the replacement of sections of the E. coli genome with 
synthetic DNA. The approach can be iterated and may enable replace- 
ment of the entire E. coli genome with synthetic DNA in approximately 
fourteen steps (Fig. 2a), given the length independence of REXER, 
demonstrated herein, and the ability of E. coli to accept 300-kb BACs. 
Each step takes only a few days to implement and convergent syntheses 
may further accelerate complete genome synthesis. The strategy will 
enable radical, high-density changes to the genome that are not acces- 
sible through site-directed mutagenesis approaches’”*, and enable 
diverse applications including recoding and metabolic engineering. 
We anticipate that this approach may be extended to facilitate genome 
engineering in other organisms. 

We have simultaneously recoded several genes in an essential 
operon using eight well-defined synonymous recoding rules that are 
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compatible with codon reassignment for unnatural polymer synthesis. 
Our results reveal marked differences in the extent to which different 
synonymous replacements for the same target codons are allowed. Our 
approach also enables both the identification and repair of idiosyn- 
cratic positions within the ‘recoding landscape’ where a precise codon 
substitution that is allowed at many other positions in the operon is 
disallowed. Our investigation empirically identifies precisely defined 
schemes for sense codon removal and synonymous replacement for 
genome-wide application. 

Note added in proof: Additional work on variable codon substitution*” 
was recently reported??””. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


No statistical methods were used to predetermine sample size. The experiments 
were not randomized and the investigators were not blinded to allocation during 
experiments and outcome assessment. 

Sequences. Sequences of plasmids, BACs, and modified genomic loci are provided 
in Supplementary Data 1-5. 

Construction of selection cassettes, cell strains, and plasmids. Two double selec- 
tion cassettes were constructed. The —1/+1 is a fusion between the negative selec- 
tion marker rpsL (—1) encoding the essential ribosomal protein $12 and conferring 
sensitivity to streptomycin in rpsLK43R genomic background, and the positive 
selection marker Kan® (+1) encoding the kanamycin resistance gene neomycin 
phosphotransferase II. The rpsL—Kan® cassette was expressed as two separate 
proteins from a single mRNA driven by constitutive transcription from the wild- 
type rpsL promoter. The —2/-+2 is a fusion between the negative selection marker 
sacB (—2) conferring sensitivity to sucrose, and the positive selection marker Cm® 
(+2) encoding the chloramphenicol resistance gene chloramphenicol acetyl trans- 
ferase. The sacB-Cm® cassette was expressed as two separate proteins from a single 
mRNA driven by constitutive transcription from the EM7 promoter. Both selection 
cassettes were synthesized de novo. 

The minimum genome E. coli strain MDS42 was used as the starting strain”. 
A K43R mutation was introduced into the rpsL gene to create MDS42'P/ KBR to 
confer resistance to streptomycin in the absence of an additional wild-type copy of 
rpsL and sensitivity to streptomycin in the presence of any additional copy of wild- 
type rpsL. The —1/+1 cassette rpsL-Kan* was inserted between positions 89,061 
and 89,587 in the MDS42'"?“K genome to create MDS42'P- KR, 

pKW20_CDFtet_pAraRedCas9_tracrRNA was constructed by assembling 
multiple PCR fragments using Gibson Assembly. The plasmid backbone and 
replication origin is from pCDFDuet-1 plasmid (Addgene), in which the spectin- 
omycin resistance marker is replaced with a tetracycline resistance marker from 
pBR322 plasmid (New England BioLab). The araC gene, the arabinose promoter 
(pAra), and the lambda red (alpha/beta/gamma) genes were PCR amplified from 
pRed/ET plasmid (GeneBridges). The open reading frame of Cas9 was PCR 
amplified from pCas9 plasmid”° and placed downstream of lambda red alpha. The 
tracrRNA with its endogenous promoter was PCR amplified from pCas9 plasmid”® 
and placed in the same orientation downstream of the araC gene. The pKW20_ 
CDFtet_pAraRed(A$)Cas9_tracrRNA was derived from pKW20_CDFtet_ 
pAraRedCas9_tracrRNA by inserting GTAC between the 314th and 315th nucleo- 
tides of the lambda red beta open reading frame, which leads to translational frame 
shifting and thus inactivation of lambda red beta. 

pKW21_MBlamp_Spacer0 was constructed by assembling two PCR fragments 
using Gibson Assembly Master Mix (New England BioLab). The pMB1 replica- 
tion origin and ampicillin resistance marker were PCR amplified from pBR322 
plasmid (New England BioLab). The CRISPR array with no functional spacer 
RNA (hence the nomenclature 0) between BamHI and EcoRI was PCR amplified 
from pCRISPR”*. pKW21_MBlamp_Spacer0 was verified by sequencing. CRISPR 
arrays with two or four different spacer RNA sequences for directing REXER 2 
or REXER 4, respectively, with interspaced direct repeats were commercially 
synthesized. The arrays were cloned into pKW21_MBlamp_Spacer0, replacing 
the empty CRISPR array to create different pkKW21_MBlamp_Spacers x 2 or 
pKW21_MBlamp_Spacers x 4 plasmids. The final pkKW21_MBlamp_Spacers 
plasmids were sequence verified. A related version of pkKW21_MBlerm_Spacers 
plasmids was prepared by replacing the ampicillin resistance marker in pKW21_ 
MBlamp_Spacers with an erythromycin resistance marker. 

The BAC holding the synthetic DNA was constructed by assembling multiple 
fragments. The BAC backbone is based on pBeloBAC11 (New England BioLabs) 
from nucleotide 1542 to 7041 with the addition of the double selection cassette 
—2/+2 and the negative selection marker —1, and assembled using Gibson 
Assembly Master Mix (New England BioLabs). An alternative arrangement 
utilizes —1/+1 coupled with —2. The synthetic DNA was always flanked by 
AvrII sites, which also function as PAMs and part of the protospacer for CRISPR/ 
Cas9. 

Assembling short synthetic DNA onto BAC using Gibson Assembly. The 
pBAC_HR(89,061)-sC-HR(89,587)_r was constructed by assembling three PCR 
fragments using Gibson Assembly. The first fragment was the 2.2-kb —2/+2 sacB- 
Cm cassette flanked with HR1 (89,012-89,061; all numbering is from the MG1655 
reference sequence) and HR2 (89,587-89,636) and further flanked with two AvrII 
sites; the second fragment was the —1 rpsL gene with the rrnC terminator; and 
the third fragment was the pBeloBAC11 backbone from nucleotides 1,542 to 
7,041. The assembled pBAC_HR(89,061)-sC-HR(89,587)_r was selected on Luria 
broth (LB) agar plates with 18 1g/ml chloramphenicol and sequence verified. The 
pBAC_HR(89,061)-rK-HR(89,587)_s was similarly constructed using the —1/+1 
rpsL-Kan* cassette flanked with HR1(89,012-89,061) and HR2(89,587-89,636) 
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and further flanked with two Avrl] sites, the —2 sacB gene with rrnC terminator, 
and the pBeloBAC11 backbone. The pBAC_HR(89,061)-T5Lux-sC-HR(89,587)_r 
was constructed by inserting a PCR product of an artificial lux operon between 
HRI and the —2/+2 sacB-Cm® cassette in the pBAC_HR(89,061)-sC- 
HR(89,587)_r. 

Assembling long synthetic DNA onto BAC using recombination in S. cerevisiae. 
Long synthetic DNA fragments (>20 kb) were assembled in S. cerevisiae. The pBe- 
loBAC11 backbone was converted into a BAC/YAC shuttle vector by introducing 
a S. cerevisiae replication centromere CEN and URA3 selection marker (from 
S. cerevisiae vector pRS316, Addgene). The BAC/YAC shuttle vector holding long 
synthetic DNA was assembled from 5-16 DNA fragments in S. cerevisiae”. 
Classical recombination and simultaneous double selection recombination 
protocol. The sacB-Cm® cassette was PCR amplified using primers containing 
HRI and HR2. In classical recombination, 3 \.g of this purified PCR product was 
transformed into 1001] electro-competent MDS42'PEK438/"K cells, which were 
pre-transformed with the pRed/ET plasmid and induced to express the lambda 
red components. The cells were recovered in 4ml super optimal broth (SOB) 
medium for 1h at 37°C and then diluted to 100 ml LB and incubated for 4h at 
37°C with shaking. The culture was then spun down and re-suspended in 4ml 
of LB and spread in serial dilutions on selection plates of LB agar with 18 ,1g/ml 
chloramphenicol. 

In simultaneous double selection recombination (DOSER), 31g of the same 
PCR product was transformed into 100 i electro-competent MDS42'P-K3R/rK 
cells following the same transformation and recovery protocol as above. The cul- 
ture was then spun down and re-suspended in 4ml LB and spread in serial dilu- 
tions on selection plates of LB agar with 18 1g/ml chloramphenicol and 50j1g/ml 
streptomycin. 

Multiple colonies from classical recombination and DOSER were picked for 

phenotyping. Colony PCRs of multiple clones from classical recombination and 
DOSER were performed using primer pairs flanking the genomic locus 89,061 
to 89,587 with MDS42'P/K8R/"K ips42'P/K48R, and Milli-Q filtered water with 
no resuspended colony as controls. All PCR products were run in parallel to NEB 
2-Log DNA Ladder (New England BioLab) and sequence verified by Sanger 
sequencing. 
REXER protocol. MDS42?-<8/"K cells were double transformed with pKW20_ 
CDFtet_pAraRedCas9_tracrRNA and pBAC_HR(89,061)-sC-HR(89,587)_r and 
plated on LB agar plates supplemented with 2% glucose, 10j1g/ml tetracycline and 
18 g/ml chloramphenicol. Individual colonies were inoculated into LB medium 
with 10j1g/ml tetracycline and 18 j1g/ml chloramphenicol, and grown overnight at 
37°C with shaking. The overnight culture was diluted in LB medium with 10,.g/ml 
tetracycline and 18 j1g/ml chloramphenicol to OD¢09 = 0.05 and grown at 37°C 
with shaking for around 3h until OD¢00 + 0.3. Arabinose powder was added to 
the culture to reach a final concentration of 0.5% and the culture was incubated for 
one additional hour at 37°C with shaking. The cells were harvested at OD¢00 * 0.6, 
and made electro-competent in 1/500th of the culture volume. 

We electroporated 3 jug pKW21_MBlamp_Spacers x 2 or pKW21_MBlamp_ 
Spacers x 4 plasmid into 1001] of competent cells. The cells were recovered in 4 ml 
SOB medium for 1h at 37°C and then diluted to 100 ml LB supplemented with 
50,1g/ml ampicillin and 10j1g/ml tetracycline and incubated for 4h at 37°C with 
shaking. The culture was spun down and re-suspended in 4ml LB and spread in 
serial dilutions on selection plates of LB agar with 18 j.g/ml chloramphenicol and 
50\1g/ml streptomycin. The plates were incubated at 37°C overnight, and the effi- 
ciency was calculated by counting colonies. Multiple colonies were picked, resus- 
pended in Milli-Q filtered water, and arrayed on LB agar plates or LB agar plates 
supplemented with 18 .g/ml chloramphenicol or 50,1g/ml kanamycin. Colony PCR 
was also performed from resuspended colonies using the primer pair flanking the 
genomic locus 89,061-89,587. 

The resulting colonies with the —2/+2 sacB-Cm*® cassette replacing the 
—1/+1 rpsL-Kan* cassette at the genomic locus 89,062-89,586 were incubated 
in LB without ampicillin, to lose the pkKW21_MBlamp_Spacers x 2 or pKW21_ 
MBlamp_Spacers x 4 plasmid. The resulting cells were double transformed 
with pkKW20_pCDFtet_pAraRedCas9_tracrRNA and pBAC_HR(89,061)-rK- 
HR(89,587)_s. An individual colony was picked, inoculated into LB, and prepared 
into electro-competent cell as previously described. 3 jug pKW21_MBlerm_ 
Spacers x 2 or pKW21_MBlerm_Spacers x 4 plasmid was electroporated into 
the pre-induced cell. The cells were recovered in 4ml SOB medium for 1h at 
37°C and then diluted to 100 ml LB supplemented with 25 ,1g/ml erythromycin 
and 5 1g/ml tetracycline and incubated for 4h at 37°C with shaking. The culture 
was spun down and re-suspended in 4 ml LB and spread in serial dilutions on 
selection plates of LB agar with 3% sucrose and 25 1g/ml kanamycin. After incu- 
bating the selection plate at 37°C overnight, multiple colonies were picked, resus- 
pended in Milli-Q filtered water, and arrayed on LB agar plates, or LB agar plates 
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supplemented with 18 j1g/ml chloramphenicol or 50 1g/ml kanamycin. Colony 
PCR was performed from resuspended colonies using the primer pair flanking 
the genomic locus 89,061-89,587. 

The pBAC_HR(89,061)-TSLux-sC-HR(89,587)_1, pBAC_HR(89,061)-90kb/ 
Lux-sC-HR(89,587)_1, pBAC_HR(89,061)-100kb/Lux-sC-HR(192,744)_r, and 
pBAC_HR(89,061)-20kb-sC-HR(106,508)_r with matching pKW21_MBlamp_ 
Spacers x 2 or pKW21_MBlamp_Spacers x 4 plasmids were used in the other 
REXER experiments following the same protocol. Colony PCR of the /ux operon 
and the coupled —2/+2 sacB-Cm* cassette inserted at the genomic locus 89,061- 
89,587 using the primer pair flanking the genomic locus generated a 9-kb band 
for successful insertion and a 1.5-kb band for the MDS42"* control. Primer pairs 
flanking the 5’ or 3’ end of the inserted or replaced DNA were used, which gen- 
erated a PCR band for correct insertion or replacement, and no band or a band 
of the wrong size with the MDS42"* control. Colony PCR using primers for the 
internal watermarks was also performed. The 20-kb recoded region (from 89,062 
to 106,507) was PCR amplified from purified genomic DNA (QIAGEN DNeasy 
Blood & Tissue Kit) using a primer pair flanking the whole region. The 20-kb PCR 
product was purified using Bio-Rad PCR Kleen Columns and fully sequenced by 
Sanger sequencing. 

Choice of region for systematic and defined synonymous recoding. We applied 
a sliding window approach, in which we counted the number of target codons in 
all essential genes within a 10-kb region of the MDS42 genome. Starting from the 
first 10 kb of the genome sequence, we iteratively shifted the window by 100 nt and 
performed the codon analysis until the end of the MDS42 genome sequence. Gene 
essentiality was defined by transposon insertion densities from a genome-scale 
genetic footprinting study in E. coli*’, which led to comparable results to those 
obtained when we used the KEIO collection data‘. 

Choice of recoding rules. We characterized all serine, leucine, and alanine codons 
using the codon adaptation index (cAi)*? and the tRNA adaptation index (tAi)*!"”. 
In the case of cAi, we used the relative adaptiveness of each codon i (expressed as 
cAiw;) as a metric. In the case of tAi, we used the relative adaptiveness value of each 
codon i (expressed as tAiw;) in Supplementary Table 27!**. We defined ideal syn- 
onymous substitutions for targeted codons by minimising the difference between 
cAiw; and tAiw;. Comparing cAiw; and tAiw; for all codons, we noticed that the 
two metrices did not correlate well (Pearson’s R’ = 0.24) and decided to propose 
a third metric. In particular, we assumed that translation efficiency increases 
proportionally with increasing isoacceptor tRNA concentration and decreases 
proportionally with increasing numbers of competing codons that are translated 
by the same isoacceptor tRNA. On this basis we defined the translation efficiency 
(tE) of codon ias follows: 


cognate : 1.0 
G—U/U-— Gwobble:0.5 
C/U — xo°U:0.25 , 
C/U — inosine: 0.1 
A — inosine : 0.05 
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where codon iis translated by tRNAs j, kj denotes the interaction strength between 
codon iand tRNA j, m denotes each codon translated by tRNA j, and k,,; denotes 
the interaction strength between codon m and tRNA j. The interaction strengths 
were defined in five groups: i) ‘cognate’ for codons that are reverse complements 
to the respective tRNA anticodon as well as AUA"*-k?CAU®4; ii) ‘G-U/U-G 
wobble for codons where a third position G or U interacts with a (modified) tRNA 
U or G, respectively; iii) ‘C/U — xo°U’ for codons where a third position C or U 
interacts with an xo°-modified uridine in the tRNA anticodon; iv) ‘C/U — inosine 
where a third position C or U in the codon interacts with an inosine in the tRNA 
anticodon (an interaction shown to be 3-8-fold weaker than G-U wobbling”); and v) 
‘A — inosine’ for the reportedly weak interaction between the third position A in 
a codon with an inosine in the tRNA anticodon*’. We obtained the tRNA con- 
centrations [tRNA] from reported measurements performed on E. coli cultures, 
expressed as a fraction of tRNA out of total tRNA in per cent. To determine the 
relative transcriptomic codon frequency q for each codon i we first calculated the 
codon’s absolute transcriptomic frequency r;: 


i= yh X bes 
x 


where gj, is the frequency of codon i in gene x and t, is the transcript abundance 
of gene x according to empirical data (DNA array data for wild-type E. coli grown 
at 0.5 h~')*. Finally r; was transformed into q; by dividing r; by the maximal value 
found for r across all codons: 


Tj 


4 max(r)" 

Using the three coding metrics, we constructed Extended Data Table 1 by assign- 
ing the closest substitutions (in pink) for synonymous recoding of TCAS and 
TCG" (in grey). 

Individual recoding landscapes and compiled recoding landscapes. Based on 
the complete sequence of the recoded region for each clone, the identity of the 
codon at each of the attempted recoding positions and the duplicated region 1 to 5 
was identified either as recoded, with a binary value of 1 and coloured red, or wild 
type, with a value of 0 and coloured black. The distribution of targeted positions 
that were recoded and that remained wild type across the refactored mraZ to ftsZ 
region gives an ‘individual recoding landscape. 

The 16 individual recoding landscapes of the 16 individual clones of each of the 

recoding schemes were compiled to generate the ‘compiled recoding landscape of 
each recoding scheme by counting the fraction of clones being recoded at each tar- 
geted position across the whole refactored mraZ to ftsZ region. When the recoding 
fraction at a given position is greater than 0 (red), it indicates that there is at least 
one sequenced clone being recoded at this position. When the recoding fraction 
reaches 0 (black), it indicates that the wild-type codon always remains and that the 
recoded codon may not be tolerated at these positions. 
REXER + ssDNA recombineering protocol. A single-stranded oligo of a total 
length of 90 nt was designed and synthesized to change the deleterious sequence 
of AGT in ftsA codon position 407 on the synthetic sequence of r.s.1 to a fixing 
sequence of AGC. The oligo sequence was designed based on the reverse strand of 
the synthetic sequence to bind the forward strand with the single nucleotide change 
positioned in the middle (position 45 from 5’ end). The last two nucleotides in the 
5! end of the oligo were substituted with a phosphorothioate backbone to protect 
the oligo from unspecific exonuclease degradation. 

We co-electroporated 3 1g of matching pkW21_MBlamp_Spacers x 2 plasmid 

and 0.2 nmol of the fixing oligo into the pre-induced MDS42'P-K®2/'K cel]s 
with pCDFtet_pAraRedCas9_tracrRNA and pBAC_HR(89,061)-20kb/r.s.1- 
sC-HR(106,508)_r. The normal REXER procedure was carried out without any 
modification. SNP-PCRs from re-suspended survival colonies were performed*® 
using the SNP-PCR primer pairs specific for either the ftsA codon position 407 
wild-type sequence TCG or the fixed sequence AGC” with KAPA 2G fast multiplex 
mix, and analysed on QIAGEN QIAxcel Advanced using QIAxcel DNA Screening 
Kit with QX Alignment Marker 15 bp/5kb and QX Size Marker 250-4000 bp. 
Clones with the correct genotype following REXER + ssDNA recombineering 
were verified by Sanger sequencing through the entire 20-kb region. 
Growth rate measurements and analysis. Glycerol stocks of the assayed bac- 
terial clones were used to inoculate 5 ml LB in the absence of antibiotics for 
overnight incubation at 37°C with shaking. The overnight cultures were used 
to inoculate triplicates of 1 ml LB in a deep-well pre-culture plate at a ratio 
of 1:100, followed by incubation at 37°C for 6h with shaking. Each replicate 
on the pre-culture plate was used to inoculate 200 11 LB in a 96-well meas- 
urement plate with a dilution factor of 100. The measurement plate was incu- 
bated at 37°C for 16h with shaking at 400 rpm in an M200 Pro Plate Reader 
(Tecan). Readings of OD¢0o were taken for each well every 10 min. Plate reader 
absorbance data was adjusted to correspond to spectrophotometer readings 
by collecting measurements from a dilution series of bacterial cultures and 
fitting the plate reader data y with a polynomial to the spectrophotometer 
values x: y=2.053x? + 2.2x + 0.061. 

To determine doubling times, the growth curves were log,-transformed, and 
the first derivative was determined (d(log2(x))/dt). During exponential growth, 
the log,-derivative is maximal and constant. The ten time-points with the maximal 
log)-derivatives were identified and used to calculate the average value with stand- 
ard deviation for each of the replicates. A total of 12 replicates (three independent 
clones, each independently repeated four times) were used to calculate the doubling 
time for each fully recoded scheme. For each doubling time, the average across the 


n= 12 replicates was determined and the error o was propagated using the follow- 
n 


|B e/a. 
i=1 

Data availability. The sequences used in this study are available in Supplementary 
Data 1-5. Supplementary Data 1: Genomic locus for selection marker —1/+1; 
Supplementary Data 2: Plasmid containing lambda red, Cas9, and tracrRNA; 
Supplementary Data 3: BAC containing /ux operon and —2/+2 for integration; 
Supplementary Data 4: Plasmid containing spacers for REXER 2; Supplementary 
Data 5: Plasmid containing spacers for REXER 4. All other datasets generated and/ 
or analysed during the current study are available from the corresponding author 
on reasonable request. 
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Extended Data Figure 1 | Simultaneous double selection and 
recombination enhances integration at a target locus. a, Classical 
recombination and double selection recombination. In classical 
recombination, a linear dsDNA with a synthetic DNA (s. DNA) sequence 
and a positive selection marker (+, Cm) flanked by homologous region 
1 (HR1) and homologous region 2 (HR2) is transformed into the cell. 
Recombinants are selected by expression of the positive selection marker. 
By simultaneous double selection recombination, s. DNA containing the 
double selection marker —2/+2 (sacB-Cm®) is integrated in place of the 
double selection marker —1/+1 (rpsL-Kan*) on the genome. Double 
selection for the gain of +2 and loss of —1 selects for simultaneous gain 
of s. DNA and loss of genomic sequence, and improves recombination 
at the target genomic locus. b, Colony PCR of clones from classical 
recombination and simultaneous double selection and recombination. 
c, All of the clones isolated by simultaneous double selection and 
recombination have s. DNA integrated at the target locus. The data show 
the mean percentage + s.d. at the correct locus (n = 6, three biological 
replicates each performed in two technical replicates, for each technical 
replicate 8 clones were phenotyped). P< 10-4, two tailed t-test for the null 


REXER2 REXER4 


hypothesis that classical recombination is as efficient as double selection 
recombination, REXER 2 or REXER 4. d, The data for simultaneous 
double selection recombination, REXER 2 and REXER 4 show the mean 
percentage + s.d. at the correct locus (n = 6, three biological replicates 
each performed in two technical replicates, for each technical replicate 
eight independent clones were phenotyped). The data for B. subtilis and 

S. cerevisiae are from previous publications. A previously reported method 
integrating foreign DNA into B. subtilis genome only using negative 
selection gave 3% (9 out of 271) of selected clones with right combination 
of markers*!'. A previously reported method replaced the S. cerevisiae 
chromosome III in 11 steps using only positive selection. The efficiency, 
as judged by clones with the correct combination of markers, was reported 
for ten of these steps; the mean percentage of clones with the right 
combination of markers is plotted (13%). The error bar represents the 
maximum and minimum integration efficiency as judged by clones with 
the correct combination of markers. The minimum efficiency was 0.5% 
(replacement of 55 kb), the maximum efficiency was 59% (replacement of 
9kb)®. For gel source images, see Supplementary Fig. 1. 
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Step 1, replacing double selection cassette -1/+1 (rpsL-Kan®) with -2/+2 (sacB-Cm*) 
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Extended Data Figure 2 | REXER enables site-specific integration of 
large DNA fragments into the genome. a, The use of two distinct double 
selection cassettes (—1/-+1 (rpsL-Kan®) and —2/+2 (sacB-Cm®) allows 
simultaneous selection for the loss of the negative selection marker on 

the genome and the gain of the positive selection marker from the BAC 
upon integration of synthetic DNA. b, Efficient replacement of genomic 
rpsL-Kan* with BAC-bound sacB-Cm® using REXER 2 and REXER 4. 

All colonies tested (n = 22) contained the correct combination of selection 
markers after REXER 2 or REXER 4 as analysed by phenotyping, colony 
PCR, and DNA sequencing (not shown). c, Efficient insertion of 9-kb 
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synthetic DNA. Genomic rpsL-Kan* was replaced with a synthetic lux 
operon coupled to sacB-Cm* using REXER 2 and REXER 4. All colonies 
on the tenfold dilution double selection plates for REXER 2 and the 

10*- fold plates for REXER 4 show bioluminescence. Eleven colonies each 
from REXER 2 and REXER 4 showed correct integration by phenotyping, 
colony PCR, and DNA sequencing (not shown). d, Efficient insertion of 
90-kb synthetic DNA. The 90-kb DNA consisted of the Jux operon in 

the middle of 80-kb DNA (previously deleted from the MDS42 genome) 
and followed by sacB-Cm%, carried on a BAC. For gel source images, 

see Supplementary Fig. 1. 
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Extended Data Figure 3 | Replacement of 100 kb of genomic DNA via 
REXER. a, The synthetic DNA contains the 100-kb wild-type DNA 
(open reading frames in grey) with five genes of the [ux operon (blue) 
and sacB-Cm®. Complete replacement leads to integration of all five 
lux genes (luxA, B, C, D and E) resulting in bioluminescent cells, while 
partial replacement confers loss of one or more lux genes and loss of 
bioluminescence. b, After REXER 2, 80% of 2 x 107 colonies examined 
were bioluminescent; after REXER 4, 50% of 2 x 10? colonies examined 
were bioluminescent. c, Eleven bioluminescent colonies from REXER 

2 and eleven bioluminescent colonies from REXER 4 were analysed. 
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All colonies analysed had all five lux genes correctly integrated, indicating 
complete replacement of the 100-kb genomic region. All clones analysed 
contained the right combination of selection markers. d, Eleven 
bioluminescent colonies from REXER 2 and eleven non-bioluminescent 
colonies from REXER 2 were analysed. While bioluminescent colonies 
contained all five ux watermarks, all the non-bioluminescent colonies 
analysed were lacking one or more lux genes, indicating partial 
replacement of the genomic region. All clones analysed contained 

the right combination of selection markers. For gel source images, 

see Supplementary Fig. 1. 
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Extended Data Figure 4 | Iterative REXER. a, The product of REXER shown in Extended Data Fig. 2a was used as a template for the next round of 
REXER. b, The phenotypes of clones from the first round of REXER. c, The phenotypes of clones from the second round of REXER. For gel source 


images, see Supplementary Fig. 1. 
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Extended Data Figure 5 | See next page for caption. 
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Extended Data Figure 5 | Synonymous codon compression strategies. 
a, Codon and anticodon interactions in the E. coli genome. Twenty- 

eight sense codons are highlighted in grey, along with the amber stop 
codon. The genome-wide removal of these sense codons, but not other 
sense codons, would enable all their cognate tRNA to be deleted without 
removing the ability to decode one or more sense codons remaining in 
the genome. This is necessary but not sufficient for the reassignment 

of sense codons to unnatural monomers. Serine, leucine and alanine 
codon boxes are highlighted because the endogenous aminoacyl-tRNA 
synthetases for these amino acids do not recognize the anticodons of their 
cognate tRNAs. This may facilitate the assignment of codons within these 
boxes to new amino acids through the introduction of tRNAs bearing 
cognate anticodons that do not direct mis-aminocylation by endogenous 
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synthetases. The number of total codon counts for all 64 triplet codons in 
the MDS42 genome (GenBank accession number AP012306), all known 
codon-anticodon interactions through both Watson-Crick base-paring 
and wobbling, base modification on tRNA anticodons, tRNA genes, and 
measured in vivo tRNA relative abundance are reported. This analysis 
identifies 10 codons from the serine, leucine, and alanine groups (serine 
codon TCG, TCA, AGT, AGC; leucine codon CTG, CTA, TTG, TTA; 
and alanine codon GCG, GCA) that satisfy both the codon-anticodon 
interaction and aminoacyl-tRNA synthetases recognition criteria for 
codon reassignment. b-d, Serine, leucine and alanine codon removal 
and tRNA deletion strategies compatible with codon reassignment to 
unnatural amino acids (u.a.a.). 
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Extended Data Figure 6 | Recoding landscapes for compression of serine codons by REXER. a, The sequences for the systematically recoded mraZ to 


ftsZ region were de novo designed, synthesized and assembled into BAC and used for REXER. b-d, The recoding landscapes for serine recoding schemes 
(r.s.) 1-3, and the resulting compiled recoding landscape. 
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Extended Data Figure 8 | See next page for caption. 
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Extended Data Figure 8 | Identifying and fixing a deleterious sequence 
in defined and systematic synonymous recoding. a, Recoding codon 
407 in ftsA in the wild-type genomic background. The wild-type codon 
at ftsA codon position 407 is the serine codon TCG. We sequenced 16 
post-REXER clones for TCG to AGT and 20 post-REXER clones for TCG 
to TCT. b, Changing ftsA 407 AGT to AGC in the serine r.s.1 background. 
We sequenced 16 AGT clones and 16 AGT to AGC clones. c, Changing 
ftsA 407 AGT to AGC in the serine r.s.1 background greatly improved 

the fraction of fully recoded clones across the entire 20-kb region from 
0% to 94% (16 clones sequenced). d, The fixed serine r.s.1 with ftsA 407 
AGC yielded clones with no measurable growth defect. The doubling 
times of fully recoded clones from serine r.s.1 with ftsA 407 AGC, 

serine r.s.2, serine r.s.3, and alanine r.s.7 were measured and showed no 
measurable growth defects when compared to the wild-type MDS42 E. coli 
control with the second double selection cassette integrated at the same 
genomic locus. (The P values for the null hypothesis that the doubling 
times of each recoded clone is different from the wild-type control were 
calculated by two-tailed t-tests. Serine r.s.1 ftsA 407 AGC versus wild type, 
P=0.54; serine r.s.2 versus wild type, P= 0.62; serine r.s.3 versus wild 
type, P=0.39; alanine r.s.1 versus wild type, P= 0.47.) n= 12 biological 
replicates and error bars show s.d. e, Combining single-strand DNA 
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recombineering with REXER to fix a short deleterious stretch within 

the synthetic sequence of r.s. 1. A 90-nt single-stranded oligonucleotide 
was designed to change the deleterious sequence of AGT in ftsA codon 
position 407 in rs.1 to a tolerated sequence, AGC. The oligonucleotide 
sequence was designed based on the reverse strand of the synthetic 
sequence to bind the forward strand with the single nucleotide change 
positioned in the middle (45 from nt 5’ end). The oligonucloeotide was 
co-transformed into E. coli during a REXER experiment that introduced 
r.s. 1 into the genome. f, Fixing a short deleterious sequence on synthetic 
DNA with REXER + ssDNA recombineering. Sixteen clones from REXER 
double selection (described in e) were randomly picked and subjected to 
single nucleotide polymorphism (SNP) genotyping using primers specific 
for either the wild-type sequence in ftsA codon position 407 (TCG) or the 
fixed sequence (AGC). MDS42'?/K438/"K was used as the wild-type control 
and a fully recoded clone from serine r.s.3 with verified ftsA 407 AGC as 
the positive control. SNP genotyping at ftsA codon position 407 identified 
one clone (clone 12, highlighted in orange) out of a total of 16 clones 
tested with fixed sequence AGC, which was then fully sequenced across 
the entire 20-kb recoding region and confirmed as fully recoded at all 83 
targeted codon positions. For gel source images, see Supplementary Fig. 1. 
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Extended Data Table 1 | Defining recoding rules by codon adaptation index (cAi), tRNA adaptation index (tAi), and translation efficiency (tE) 


Metric | Substitution | Metric | Substitution Substitution 
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0.049 
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b | codon chi thi te 
Metric | Substitution | Metric | Substitution | Metric | Substitution 
cre™ 0.098 
ora 0.010 
cir 0.036 
cre” 0.069 
0.034 | cTT' 
0.068 | CTC‘ 


Substitution | Metric | Substitution Substitution 


Gor? 
ecc™ 


We defined the best synonymous replacements for the target serine (a), leucine (b), and alanine codons (c) by identifying the closest match for the target codons, as judged by codon adaptation index 
(cAi), tRNA adaptation index (tAi), or a third metric that combines codon abundance and measured tRNA concentrations to estimate translation efficiency (tE) (see Methods). The table assigns the 
closest substitutions (in pink) for synonymous recoding of targeted codons (in grey) using the three coding metrics. Where two substitutions are comparable the one that conserves G, C content is 
chosen. The number in bold is the value of the best matching substitution in a given coding metric. 
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Extended Data Table 2 | Properties of genes targeted for recoding 
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Protein functions, localizations, expression levels ( 
simultaneously recoded in this work (a), and of in 
recoding schemes are also reported. The expression level data are from http://www.pax-db.org. 


in parts per million), and lengt! 


Total number of target codons: | 14] 14 
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tplV___|Protein translation |cytoso 7848.2 333 111 0 
rplS Protein translation |cytoso 3859.3 348 116 0 
rpIR___|Protein translation |cytoso 6367.3 354 118 0 
rplT Protein translation |cytoso 3291.4 357 119; O 
tpsM_ |Protein translation |cytoso 5733.1 357 119 1 
rplL Protein translation |cytoso 14543.5 366 122 0 
rpIN Protein translation |cytoso 8866.6 372 124 1 
trpsL__|Protein translation |cytoso! 5532.8 375 125 0 
tplIQ__|Protein translation _|cytoso! 4272.8 384 128 0 
trpsK__|Protein translation |cytoso 2900.5 390 130 1 
tpsH__|Protein translation |cytoso 3828.3 393 131 0 
rpsl Protein translation _|cytoso! 3410.8 393 131 0 
rpIP Protein translation _|cytoso! 3778.1 411 137 0 
rpIM___|Protein translation _|cytoso! 4268.0 429 143 0 

Protein translation |cytoso 5111.6 435 145 1 

Protein translation |cytoso! 7731.6 498 166 1 
rpsE_ {Protein translation |cytoso! 7657.3 504 168 0 
rplIF {Protein translation |cytoso 5012.1 534 178} O 
trpsG_ |Protein translation |cytoso! 8660.2 540 180 0 

Protein translation |cytoso 3489.1 540 180 0 
rpID {Protein translation |cytoso! 3469.9 606 202 0 0 
rpsD_ |Protein translation |cytoso! 5187.4 621 207 1 1 
rpIC_ [Protein translation [cytoso! 4460.3| 630 210] Oo; oO a 
rpsC_ {Protein translation |cytoso! 5755.0 702 234 0 0 
rpsB_ [Protein translation [cytoso! 4324.5| 726 242| of oO 
rpilB Protein translation |cytoso 5658.4 822 274 1 1 1 1 
prfB__‘|Protein translation |cytoso 570.9] 1099 366/ Oo] O 4 
tpsA__|Protein translation |cytoso 2649.1| 1674 558 oO} O 


ARTICLE 


s (both ORF length in bp and peptide length in amino acid count) of the genes in the essential cell division operon all 
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ividually recoded ribosomal and release factor 2 genes reported previously?’ (b). The numbers of codons targeted for removal according to different 
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A unique feature of Pluto’s large satellite Charon is its dark red 
northern polar cap!. Similar colours on Pluto’s surface have been 
attributed? to tholin-like organic macromolecules produced by 
energetic radiation processing of hydrocarbons. The polar location 
on Charon implicates the temperature extremes that result from 
Charon’s high obliquity and long seasons in the production of this 
material. The escape of Pluto’s atmosphere provides a potential 
feedstock for a complex chemistry**. Gas from Pluto that is 
transiently cold-trapped and processed at Charon’s winter pole was 
proposed!” as an explanation for the dark coloration on the basis 
of an image of Charon’s northern hemisphere, but not modelled 
quantitatively. Here we report images of the southern hemisphere 
illuminated by Pluto-shine and also images taken during the 
approach phase that show the northern polar cap over a range of 
longitudes. We model the surface thermal environment on Charon 
and the supply and temporary cold-trapping of material escaping 
from Pluto, as well as the photolytic processing of this material 
into more complex and less volatile molecules while cold-trapped. 
The model results are consistent with the proposed mechanism for 
producing the observed colour pattern on Charon. 

The most prominent colour feature in New Horizons images of 
Charon is its reddish northern polar cap. Figure 1a combines the blue 
(400-550 nm), red (540-700 nm) and near-infrared (NIR; 780-975 nm) 
channels from the Multispectral Visible Imaging Camera (MVIC), part 
of New Horizons’ Ralph remote sensing package™®. Figure 1b shows 
how reflectance in these three filters varies with latitude, averaged 
over the longitudes shown in Fig. la. Charon’s surface gets darker and 
redder towards higher latitudes. Colour ratios of NIR/red and red/ 
blue show similar latitude dependences to one another’, suggesting a 
single-pigment material with increasing abundance towards the pole. 
Additional approach images are shown in Extended Data Fig. 1. 

Longitudinal variability in the NIR/blue colour ratio is shown in 
Fig. 1c. At higher latitudes, colours are redder across all of the longitudes 
observed, although the trend is not perfectly uniform. Deviations may 
be related to local variations in topography or other parameters. The 
red coloration is interrupted by a few impact craters with diameters of 
several kilometres. Impacts that size occur rarely, probably much less 
frequently than once every million years’, so their existence implies that 
the red material must accumulate slowly (see Methods). 

Infrared spectroscopy also supports slow accumulation. Spectra of 
Charon’s pole are dominated by H20 ice absorptions similar to spectra 
of lower-latitude regions”. The data are consistent with up to 10% tholin 


mixed with H2O ice at the millimetre depths that are probed by the 
infrared observations, so tholin deposition must occur slowly enough 
that H,O resupply or upward mixing of H,O by impact gardening can 
compete. 

Charon’s south pole is currently in winter night. New Horizons 
observed the night-side with the Long Range Reconnaissance Imager 
(LORRI®) approximately 2.6 days after the closest approach, illuminated 
by Pluto-shine. The images reveal a decreasing brightness towards the 
pole (Fig. 2 and Extended Data Fig. 2) that cannot be attributed to 
the declining illumination alone, but requires a decreased albedo of the 
pole relative to equatorial latitudes (see Methods). The sunlit northern 
hemisphere in LORRI approach images shows a comparable albedo 
decline towards that pole (Fig. 2d). 

Charon’s surface temperature responds to solar forcing on diurnal 
(6.39 Earth days) and annual (248 Earth years) timescales. The high 
obliquity (currently 119°) causes polar latitudes to experience long 
periods of continuous darkness, during which they become extremely 
cold. Complicating the situation, the eccentricity of Pluto’s heliocentric 
orbit, currently 0.253, results in a factor of 2.8 difference in the 
intensity of sunlight between perihelion and aphelion. To assess the 
thermal history of Charon’s surface, we ran thermophysical models? 
that track the diurnal and annual vertical heat flow into and out of 
Charon’s surface at different latitudes, accounting for the present-day 
orbital parameters!® (see Methods). Figure 3a shows model thermal 
histories for four different northern hemisphere latitudes over the 
past few centuries for a nominal thermal inertia of 10Jm~* K™! g 12, 
which is consistent with estimates of Charon’s diurnal thermal inertia 
from Herschel Space Telescope observations!'. Models for bracketing 
thermal inertias (2.5 and 40J m~* K~!s~"/”) are shown in Extended 
Data Fig. 3, spanning the range reported from icy satellites’? to Kuiper 
belt objects'*. Charon’s north pole experienced more than a century of 
continuous, extremely low temperatures from the late 1800s through to 
the spring equinox in 1989. Lower latitudes experienced briefer periods 
of continuous extreme cold, and also reached less extreme minimum 
temperatures. 

The latest Charon year shown in Fig. 3a provides an incomplete 
picture of Charon’s long-term thermal history because the pole 
precesses and the longitude of perihelion regresses'®'4, both on 3-mil- 
lion-year (3-Myr) timescales. For an idea of Charon’s longer-term 
thermal history, we ran thermal models for previous Charon years, 
selecting one every 4 x 10°yr over the past 3 Myr. Durations of continu- 
ous periods colder than a 25 K threshold temperature for cold-trapping 
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Figure 1 | Charon’s red northern pole. a, Polar stereographic projection 
with Ralph’s BLUE, RED and NIR filter images displayed in blue, 

green and red colour channels, respectively, relative to a Hapke 
photometric model (see Methods). b, Latitude dependence of the 
reflectance relative to the photometric model. c, Longitudinal dependence 


(derived below) were averaged over the modelled epochs and also over 
the northern and southern hemispheres. These are shown as a function 
of latitude in Fig. 3b. Higher latitudes experience longer periods below 
the threshold temperature. 

CH, and N; currently escape from Pluto at rates of 5 x 10°° and 
1 x 10” molecules s~', respectively, as estimated from New Horizons 
data*, with the Pluto exobase located at around 2.5 Pluto radii (that is, 
at an altitude of approximately 1,780km)"». These are different from 
conditions assumed in a pre-encounter study’, which investigated 
the transfer of Pluto’s escaping atmosphere to Charon including 
condensation on the winter pole. This model was based on an N 
escape rate from Pluto of 2.3 x 107” molecules s~!, of which 5.7 x 107° 
molecules s~! (that is, around 2.5%) encountered Charon, leading to 
N> ice being deposited at a rate of 0.2 1m per decade on the winter 
pole. More recent simulations!® report comparable arrival fractions. 
The long-term temporal variability of the escape rate is not known. 

Assuming that Charon intercepts 2.5% of the CH, escape flow 
determined by New Horizons data, that would correspond to a globally 
averaged arrival rate of 2.7 x 10'' molecules m~*s~! at Charon. We 
estimate that most of the CH, remains at Charon long enough to find 
its way to the winter pole (see Methods). If it accumulates within 45° 
of the winter pole (a typical polar size, see Fig. 3b), it would produce 
a layer of CH, ice that is approximately 0.3 1m thick at each pole of 
Charon during the winter portion of the Pluto year. The accumulation 
would presumably vary with latitude, being thicker towards the pole. 

Condensation on a surface depends on the temperature and vapour 
pressure'”. The pressure at Charon’s surface depends on the area of the 
polar cold trap. For a 45° radius pole, it can be estimated as 1 x 1071! Pa. 
This pressure corresponds to an equilibrium vapour pressure of CHy 
ice at a temperature of 25 K (ref. 18). Where the surface temperature is 
colder, CH, will tend to freeze out as ice. 
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of the NIR/BLUE colour ratio. d, Wavelength dependence at two latitudes 
(coloured points) compared with spectral models of a laboratory tholin 
plus a neutral material (grey curves). The vertical bars indicate the 
standard deviation within each latitude bin; the horizontal bars indicate 
the filter widths. 


Methane escaping from Pluto's atmosphere is accompanied by other 
minor species. These include about 0.2% N>, various C2 hydrocarbons 
(at a few tens of parts per million) and radicals such as CH; (at around 
100 parts per million). N> is more volatile and thus requires lower 
temperatures to be cold-trapped, so it should freeze onto Charon’s 
surface over a smaller range of latitudes and times of year than CH 
does. Heavier hydrocarbons can condense anywhere on Charon, but 
would not produce enough of a deposit to be visible. 

Our hypothesis requires energetic radiation to process the seasonally 
cold-trapped CH. It is frozen on Charon’s surface only during the polar 
winter night, so it must be processed rapidly, on the timescale of a 
century, and only by radiation impinging on the night side. It need not 
be fully converted into macromolecular solids such as tholins on such 
a short timescale, only into molecules that are sufficiently non-volatile 
to remain on the surface after the pole re-emerges into sunlight and 
warms back up. Charon’s surface is subject to a variety of energetic 
radiation sources, including ultraviolet photons, solar wind charged 
particles, interstellar pickup ions and galactic cosmic rays!?”°. The most 
important night-side source of energetic radiation appears to be solar 
ultraviolet Lyman alpha photons (Lya, 10.2 eV) that have been scattered 
by the interplanetary medium, with a photon flux of 3.5 x 10''m~*s7! 
on the night side. For 2.7 x 10'' molecules m~*s~! arriving at Charon, 
concentration by cold-trapping in a 45° cold pole (approximately 1/7th 
of Charon’s surface area) results in the accumulation of around 3nm 
of ice per Earth year, of which we estimate 21% is photolysed (see 
Methods). There would probably also be some loss to sputtering”!. N2 
ice is unaffected by Lya”* but some could be processed by cosmic rays. 

When the winter pole re-emerges into sunlight, CH, and N 
sublimate away rapidly, but heavier, less volatile products remain 
behind. Assuming that the mass density of these photolytic prod- 
ucts is double that of CH, ice, around 40nm would accumulate 
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Figure 2 | Winter pole in Pluto-shine. a, Stack of 99 images showing a 
bright, sunlit crescent and fainter reflected light from Pluto. North, defined 
by the angular momentum vector, is up. b, Photometric model assuming 

a uniform albedo (see Methods). c, Observation/model ratio, showing 

that southern high latitudes are dark relative to equatorial latitudes. 

d, Dependence of the ratio on the absolute value of latitude as indicated 

by the x axis. The data points are for the Pluto-shine-illuminated southern 
hemisphere. The horizontal bars indicate the width of the latitude bin and the 
vertical bars show the standard deviation of the mean within each latitude 
bin. The dashed curve is for the sunlit northern hemisphere (see Methods). 


per Pluto winter, or 0.16 mm per million Earth years. They will be 
exposed to other sources of energetic radiation including ultraviolet 
and extreme-ultraviolet photon radiation directly from the Sun, driv- 
ing further photolytic chemistry” as well as sputtering erosion. At 
Pluto and Charon’s mean heliocentric distance of 39 au, the solar Lya 
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Figure 3 | Thermal environment. a, Model surface temperature history for 
the equator and three northern latitudes on Charon from 1750-2050 for a 
thermal inertia of 10Jm~? K~!s~!. Envelopes for each latitude indicate 
diurnal minimum and maximum temperatures. The dashed line is the 25 K 
threshold below which CH, is cold-trapped. b, Longest continuous duration 
below that temperature each Charon year, averaged over thermal models 
spanning the past 3 Myr. Dotted, solid, and dashed curves are for thermal 
inertias (in J m~? K~!s~"”) of 2.5, 10 and 40, respectively. 
energy flux is 1.9 x 10'3eV m~*s~!, and that of the extreme-ultraviolet 
(>12.4eV) is 8.7 x 10''eVm~’s7!. The charged particle flux from the 
ambient solar wind, coronal mass ejections and interstellar pickup ions 
is highly variable, and has energies ranging from a few electronvolts to 
megaelectronvolts. The production efficiency of tholin is not known 
for Charon’s circumstances, but if it were 50% that would translate to 
around 30cm of tholin produced at Charon’s poles over 4 billion years. 

Strongly coloured tholins have been made in the laboratory from a 
variety of CH, and N> ice mixtures using diverse radiation sources from 
charged particles to ultraviolet photons**”. For irradiation approx- 
imately 10x greater*®?°, experiments that initially produced a red— 
orange tholin generally go on to produce colourless and much darker 
materials, indicating carbonization (graphitization). The fact that 
Charon’s pole is not completely blackened requires a balance between 
the production and further processing of tholins and processes such as 
in-falling H2O ice dust, ejecta or micrometeorite impact gardening that 
would mix the tholins into the uppermost millimetres to few metres 
of Charon’s H20 regolith. Two distinct sources could contribute dust. 
Low-velocity dust from Pluto's small satellites is estimated to produce 
a 3.cm coating over the course of 4 billion years (ref. 26) with a com- 
position that is probably dominated by H20O ice. This is an order of 
magnitude slower than our estimated tholin production, although it 
could be augmented by ejecta from large impacts elsewhere on Charon. 
Higher-velocity Kuiper belt debris impact rates are poorly constrained, 
but models based on lunar rates”””* imply gardening to centimetre 
depths on a timescale of 10’ yr (see Methods), during which time we 
estimate about a millimetre of tholin accumulates. By diluting the accu- 
mulating tholin in local substrate material, this shallow gardening may 
explain the enduring brightness of a few relatively recent craters where 
accumulating tholin is mixed into more neutral-coloured H2O-rich 
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ejecta, rather than being mixed into the already-darkened substrate, 
as elsewhere. 

The distribution of dark, reddish material around Charon’s northern 
pole is notable for its generally symmetric distribution across longitudes 
and its gradual increase with latitude, although there are local 
irregularities associated with craters, topographic features and perhaps 
subsurface variations in thermal properties. These characteristics, and 
the existence of an albedo feature around the southern pole with a 
similar latitude dependence, are consistent with our hypothesis that 
the combination of Pluto’s escaping atmosphere and Charon’s long, cold 
winters enables CH, to be seasonally cold-trapped at high latitudes, 
where some is photolytically processed into heavier molecules that are 
subsequently converted to reddish tholin-like materials. The symmetry 
argues against recent polar wander on Charon. Could the process occur 
elsewhere? Nix has a reddish spot”, but it and the other small satellites 
orbit farther from Pluto and have much lower masses, making the 
process less efficient (see Methods). 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 

Colour images. The colour image of Charon shown in Fig. 1a was obtained 
on 2015 July 14 at 10:42 ur. The spacecraft was 73,000 km from Charon, result- 
ing in an MVIC image scale of 1.4km per pixel. The Mission Elapsed Time 
(MET) unique label of this observation was 0299176432. Calibration details are 
described in ref. 6. The original image is shown in Extended Data Fig. 3c. Figure la 
shows this image reprojected on a polar stereographic projection and photometrically 
corrected by dividing each pixel by a Hapke model*” computed for the same 
illumination and viewing geometry. That model had uniform parameters across 
the scene: single-scattering albedo w= 0.9, single-scattering phase function 
P(g) =0.8, backscattering amplitude By = 0.6, porosity parameter h= 0.0044 and 
macroscopic roughness 6 = 20° (parameters from ref. 31). Figure 1b shows the 
latitude dependence of the image divided by the Hapke model, averaged over the 
longitudes shown in Fig. 1a and normalized to unity at the equator. A similar 
latitude dependence in Charon’s normal albedo has been reported from LORRI 
observations*”. The longitude and wavelength dependence are shown in Fig. Ic, d, 
respectively. More approach images are shown in Extended Data Fig. 1, without 
reprojection or photometric correction. 

Pluto-shine observations. New Horizons obtained 219 LORRI images of Charon’s 
night side early on 2015 July 17 ur from a distance of approximately 3 x 10°km. 
The spacecraft’s orientation is controlled by hydrazine thrusters, so full-resolution 
LORRI exposures must be short to minimize smear as pointing bounces around 
within a deadband. Longer exposures of 0.2 s were enabled by 4 x 4 pixel on-chip 
binning, resulting in a total integration of 44s. The images were grouped into 
two sets, the first comprising 99 images with MET labels 0299398349 through 
0299398716 acquired at a mean observation time of 00:26 ut. The second set 
had 120 images with MET labels 0299405549 through 0299405916 and a mean 
time of 02:26 ur. The two sets had image scales of 60 km per pixel and 62km per 
pixel, respectively. As LORRI was pointed close to the Sun to observe the night 
side of Charon, the images were affected by scattered light patterns modified by 
small pointing variations among the images. The scattered light variability in each 
image set was modelled using principal component analysis on the complementary 
set, after masking Charon and bright stars from the images. The eigenimages in 
each set were used to subtract the scattered light contribution from the other set. 
The images were then co-registered at the sub-pixel level and combined into two 
stacks, the first of which is shown in Fig. 2a, the second in Extended Data Fig. 2a. 

We modelled these observations by simulating Charon as a sphere approximated 
by 20,480 triangular facets. The bidirectional reflectance behaviour of each facet 
was represented with a Hapke model, using the same parameters as above. The Sun 
was treated as a point source and the distribution of light from Pluto was obtained 
by assuming it to be a Lambertian sphere with the appropriate size, location and 
illumination geometry. Being larger than Charon, Pluto casts some light onto 
Charon’s pole, albeit obliquely. Comparisons between data and model are shown 
for the first of the two image stacks in Fig. 2 and for the other stack in Extended 
Data Fig. 2. 

To compare Charon’s southern pole with the northern one as seen with the same 
instrument, we also ran a Hapke model with the same parameters for a LORRI 
full-resolution sunlit approach image obtained 2015 July 14 at 02:44 ut, froma 
distance of 470,000 km, with image scale of 2.3 km per pixel. The MET label for 
this image was 299147776. It and the corresponding model are shown in Extended 
Data Fig. 2d and e and the latitude-dependent ratio is included as a dashed curve 
in Fig. 2d and Extended Data Fig. 2f, normalized to unity over the mean from 0° 
to 10° N. MVIC colour observations were not sufficiently sensitive to detect Pluto- 
shine on Charon’s night side. 

Thermal model. A standard one-dimensional finite element model? was used 
to account for the vertical heat flow within each element of Charon’s surface in 
response to diurnally and seasonally varying insolation!°. Diurnal and annual 
timescales differ greatly, so we represented Charon’s surface with a large number 
of layers (400) and broke its year into a large number of time steps (2 x 10°, about 
an hour per time step). We assumed a uniform bolometric bond albedo Ag =0.3 
and emissivity ¢ = 0.9, ignoring the lower albedos of Charon’s poles. Diurnal 
thermal inertia values were varied from 2.5 to 40) m~* K~!s~, spanning the 
range reported for Charon and Kuiper belt objects from Spitzer and Herschel 
observations!!" to icy Saturnian satellites observed by Cassini!”. The density 
and heat capacity of H2O ice are fixed, so low thermal inertias require inefficient 
conduction between ice grains. Conduction can rise with compaction at depth, 
leading to higher seasonal thermal inertias. Simulations with higher conductivities 
up to that of monolithic H2O ice*? at depths as shallow as few tens of centimetres 
below the surface can raise polar winter temperatures by a few kelvin without 
greatly affecting diurnal temperature variations. Sunlight was assumed to be the 
only energy source, providing 870 mW m at Pluto’s mean heliocentric distance. 
We did not include radiogenic heating. If Charon’s rock fraction of around 60% 
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(ref. 34) has chondritic radionuclide abundances, heat from their decay would 
contribute an additional 1.5 mW m ~’, preventing the temperature from falling 
below about 13 K at present, and higher in the distant past. Figure 3a shows the 
model surface temperature history over the course of three centuries for Charon’s 
northern hemisphere, assuming an intermediate diurnal thermal inertia value of 
T'=10Jm~? K~!s~"?, no radiogenic heat, and no high-conductivity subsurface 
layer. High and low ’bounding cases are shown in Extended Data Fig. 3. Diurnal 
variations are much greater for small I, as is the range of seasonal surface 
temperature extremes. For all of these scenarios, Charon’s high latitudes experience 
multi-decade episodes that are sufficiently cold to cold-trap CH, as ice. 

As the spin pole of the system precesses and the longitude of the perihelion 

of its heliocentric orbit regresses!®", the insolation patterns evolve, modifying 
the surface temperature history over the course of a Pluto year. This variability 
is accommodated in Fig. 3b by averaging the longest continuous period below 
25 K during a Pluto year over both hemispheres and over three million years of 
orbital history. 
Loss mechanisms and cold-trapping. CH, molecules can be lost from Charon’s 
surface environment by several mechanisms. First we consider thermal escape to 
space. For a Maxwell-Boltzmann distribution at 60 K, corresponding to the highest 
temperatures reached on Charon’s summer hemisphere (see Fig. 3a), about 1% of 
CH, molecules exceed Charon’s 590 ms! escape velocity. The majority, which 
do not escape hop on ballistic trajectories until they achieve escape velocity or 
encounter the cold pole and stick. A typical latitude boundary of the cold pole 
from Fig. 3b is 45°. A cold pole above that latitude occupies about 1/7th of Charon’s 
surface. If each ballistic hop is assumed to arrive in a completely new random 
location, it would take an average of seven random hops to encounter the cold 
pole at a time when it is that size. With 1% lost to space per hop, 93% would make 
it to the pole, leading to an accumulation rate of 2 x 10!? molecules m~*s~!. Fora 
CHy ice density** of 516kgm_°, the deposition rate is 9 x 10~'” m s-' which adds 
up to about 0.3 1m at the pole over one Charon winter. Because the accumulation 
time is strongly latitude-dependent, the resulting ice distribution would be, too. 

Another potential loss mechanism is ionization. If a molecule is ionized, it 
becomes coupled to the magnetic field in the solar wind and is thus swept from 
the system. The cross-section of a CH, gas molecule to Lyc radiation is estimated” 
as 1.8 x 10-7! m*. The solar Lya flux at Charon’s mean heliocentric distance of 
39 au is 1.9 x 10'2cm~?s7}, so the probability of photoionization is about 0.1 per 
Earth year. At typical thermal speeds of 200-300 ms” !, CHy molecules can traverse 
Charon’s diameter in less than 2h, so the probability of photoionization during 
seven ballistic hops is low. Likewise, little CH4 would be photoionized between 
its escape from Pluto and arrival at Charon considering that Pluto’s exobase 
temperature is around 70K, so molecules escaping at the tail of the Maxwellian 
velocity distribution would reach Charon in just a few tens of hours. 

Surface pressure. An estimate of the surface pressure at Charon can be obtained by 
assuming steady state, with arriving CH, molecules undergoing an average of seven 
ballistic hops before being lost to space or cold-trapped on a 45° radius cold pole, 
so each element of Charon’s surface experiences seven times the globally averaged 
CHy arrival flux of 2.7 x 10” molecules cm~*s~! at speeds that are consistent with 
the Boltzmann velocity distribution, resulting in a momentum flux or pressure of 
roughly 1 x 10-1! Pa, which is three orders of magnitude smaller than the 30 upper 
limit from New Horizons*°. When the cold pole is smaller, the pressure increases 
because more hops occur, and the reverse is true when the cold pole is larger, so 
this pressure is very approximate. However, it enables us to estimate the threshold 
temperature of 25K for cold-trapping CH, via its vapour pressure'*. Assuming 
the radius of a CH, molecule is 2 x 10~!° m, this pressure corresponds to a mean 
free path that is > 100 Charon radii, so collisions between CH, gas molecules can 
be ignored. 

Photolysis of CH, ice. The interplanetary medium scatters Lya photons from 
the Sun. The flux from this source observed during the New Horizons flyby* was 
about 50% larger than predicted*’, or about 140 R averaged over the anti-Sun 
hemisphere, which illuminates the winter pole, equating to a flux of 3.5 x 10"! Lya 
photons m~*s~!. This appears to be the largest source of night-side ultraviolet 
illumination at wavelengths between 10 nm and 133 nm that efficiently break bonds 
in the CH, molecule****, Integrating over these wavelengths for the 1,000 stars 
with largest apparent ultraviolet fluxes using IUE spectra and Kurucz*? models 
and dividing by two to account for the visible 2m steradians of sky contributes an 
additional 1.1 x 10!° photons m~?s~!. The Lambert absorption coefficient of CHy 
ice at the Lya wavelength (1216 A)°* is 19 sm! so a 53-nm-thick CH, ice coating 
has optical depth unity. The Lya cross-section of individual CH, ice molecules 
has been reported” as 1.4 x 10-7! m?, enabling an estimation of the probability 
of photolysis of an unshielded CH, ice molecule on Charon’s winter pole as 1.5% 
per Earth year. For 2.7 x 10'' molecules ms"! arriving at Charon, concentration 
by cold-trapping on approximately 1/7th of Charon’s surface area results in the 
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accumulation of around 3 nm of CH, ice per Earth year, or a layer that is optically 
thick to Lya in 18 yr. Only the skin reachable by ultraviolet photons can be pho- 
tolysed. For steady state at this deposition rate, around 26% of the CH, would be 
processed before becoming buried beneath enough CH, ice to shield it from further 
ultraviolet photolysis. Over a century-long accumulation cycle at this rate, about 
21% is processed. 

Impacts and gardening. Only one of the bright-ray craters in Charon’s polar region 
appears to be resolved. Its darker, inner region could mark the rim, or it could be 
ejecta given that many of Charon’s other craters feature dark inner ejecta. If the 
dark/bright boundary is the rim, the diameter of the crater itself is around 5-6 km; 
otherwise, scaling from fully resolved craters, the diameter could be as small as 
approximately 1.5 km. In addition to uncertainty about crater size, the impactor 
flux in the outer Solar System for objects below around 100km is not well known. 
Using the ‘knee’ model*”*! (consistent with the size distribution of larger craters 
on Pluto and Charon’), one crater 5-6km or larger in diameter should typically 
form somewhere on Charon approximately every million years. The probability of 
occurrence in dark polar material is lower, in proportion to its smaller surface area. 
If the crater is smaller than 5-6 km, the occurrence frequency would be higher, 
but the knee model used predicts many more craters smaller than 10 km than are 
observed on Pluto or Charon’, suggesting that the timescale for the formation of 
the observed craters could be considerably longer than a million years. 

No detailed modelling has so far been conducted for impact gardening of outer 
Solar System objects. Two studies of lunar impact gardening found similar results 
for 10’ yr timescales”””. A 50% probability that material has been excavated down 
to a depth of 1 cm over that interval was estimated?’, while ref. 42 found that 
all of the study points in a grid were excavated at least once down to a depth of 
around 1 cm. The two studies diverged for longer timescales, with ref. 42 deriving a 
disturbed depth of 10cm in 10° yr and the model of ref. 27 taking 10° yr to achieve 
the same disturbed depth. 

The rate of gardening in the lunar example is calculated for impactors of all 
sizes, from the largest objects to hit the Moon down to micrometeorites and dust- 
sized particles. Extrapolation to the outer Solar System requires knowledge of the 
impactor flux, which, as described above, is poorly constrained. The lunar flux 
used by ref. 27 is larger than the best estimate from refs 40 and 41 for Charon 
for impactors larger than around 0.1 mm by up to several orders of magnitude, 
but smaller than the extrapolated knee model for smaller impactors by similar 
amounts. The dust flux (particles below 1 cm in diameter) estimated from the 
New Horizons Student Dust Counter* is four to six orders of magnitude below the 
knee model. Extrapolating from that model to much smaller sizes or from the New 
Horizons dust measurement to much larger sizes is highly uncertain, illustrating 
the broad range of the potential impactor flux. 

Other instances. A natural question to ask is whether our hypothesized 
mechanism for the production of Charon’s dark, red poles should produce similar 
deposits elsewhere. Unlike the case for Charon, escape velocities on Pluto’s 
smaller satellites are negligible compared with the thermal speeds. They intercept 
Pluto’s escaping atmosphere only in proportion to their geometric cross-sections, 
with no enhancement effect from CH, molecules ballistically hopping around 


their surfaces. We consider Nix, as it exhibits a prominent red spot”, although 
similar calculations could be done for any of the small satellites. Nix’s effective 
radius is 20km and its orbital distance is 48,760 km, so it intercepts 1.7 x 10-7 of 
Pluto’s escaping CH,, or 8 x 10!8 molecules s~!. Nix’s escape velocity is roughly 
10-20 ms !, so only CHy molecules that happen to hit an extremely cold region 
on their first impact will stick. Assuming that Nix has a long-duration cold pole like 
Charon does (it might not, if its pole precesses rapidly), the polar orientation with 
respect to radially outflowing gas from Pluto would diminish the accumulation rate 
by another factor of four, leading to an accumulation rate of around 1 x 10~7\m yr, 
about 20,000 x slower than the accumulation rate at Charon’s cold pole. Such slow 
accumulation might not be competitive with impact erosion of Nix’s surface. 
Code availability. New Horizons MVIC and LORRI images were processed using 
the US Geological Survey's Integrated Software for Images and Spectrometers 
(ISIS), available at https://isis.astrogeology.usgs.gov. Thermal models were based 
on J.R. Spencer’s thermprojrs.pro, available at https://www.boulder.swri.edu/~ 
spencer/thermprojrs. Hapke reflectance models were based on M.W. Buie’s bidr2. 
pro, available at http://www.boulder.swri.edu/~buie/idl/pro/bidr2.html. 
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Extended Data Figure 1 | Three MVIC colour images obtained on 
approach showing Charon’s northern pole as Charon rotates. 

a, Observation obtained 2015 July 11 3:35 ut, with MET label 0298891582. 
b, Observation obtained 2015 July 13 3:38 ut, with MET label 0299064592. 
c, The same observation as Fig. 1a of the main text, obtained 2015 July 14 
at 10:42 ut, with MET label 0299176432. Unlike in Fig. la, these images 
are not re-projected or divided by photometric models. North, as defined 
by the angular momentum vector, is up. 


© 2016 Macmillan Publishers Limited, part of Springer Nature. All rights reserved 


LETTER 


Reflectance ratio 
oa 
[o>) 
T 


0.2L 1 4 er = 1 
0 30 60 
|Latitude| (degrees) 


Extended Data Figure 2 | Additional panchromatic observations of model. f, Similar to Fig. 2d, with the first Pluto-shine stack indicated by 
Charon’s poles. a, Second stack of 120 images of Charon’s southern blue points and the second stack indicated by red points (offset left and 
hemisphere illuminated by Pluto-shine obtained approximately 2h after right for clarity). The horizontal bars indicate the widths of the latitude 
the stack in Fig. 2a. b, Corresponding photometric model. c, Observation/ bins and the vertical bars indicate the standard deviation of the mean 


model ratio. d, Sunlit northern hemisphere. e, Corresponding photometric —_ within each latitude bin. 
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Extended Data Figure 3 | Thermal models for cases of low and high thermal inertia. a, Thermal history for lower limit thermal inertia 
[=2.5Jm-?K~!s-!?. b, Thermal history for upper limit thermal inertia [= 40Jm~? K7! 5-1”, 
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Calculation of the axion mass based on high- 
temperature lattice quantum chromodynamics 


S. Borsanyi!, Z. Fodor!°, J. Guenther!, K.-H. Kampert!, S. D. Katz*4, T. Kawanai’, T. G. Kovacs®, S. W. Mages’, A. Pasztor!, 


FE. Pittler?-*, J. Redondo®’, A. Ringwald® & K. K. Szabo!? 


Unlike the electroweak sector of the standard model of particle 
physics, quantum chromodynamics (QCD) is surprisingly 
symmetric under time reversal. As there is no obvious reason for 
QCD being so symmetric, this phenomenon poses a theoretical 
problem, often referred to as the strong CP problem. The most 
attractive solution for this’ requires the existence of a new particle, 
the axion”*—a promising dark-matter candidate. Here we determine 
the axion mass using lattice QCD, assuming that these particles 
are the dominant component of dark matter. The key quantities 
of the calculation are the equation of state of the Universe and the 
temperature dependence of the topological susceptibility of QCD, 
a quantity that is notoriously difficult to calculate*®, especially 
in the most relevant high-temperature region (up to several 
gigaelectronvolts). But by splitting the vacuum into different 
sectors and re-defining the fermionic determinants, its controlled 
calculation becomes feasible. Thus, our twofold prediction helps 
most cosmological calculations? to describe the evolution of the 
early Universe by using the equation of state, and may be decisive 
for guiding experiments looking for dark-matter axions. In the next 
couple of years, it should be possible to confirm or rule out post- 
inflation axions experimentally, depending on whether the axion 
mass is found to be as predicted here. Alternatively, in a pre-inflation 
scenario, our calculation determines the universal axionic angle that 
corresponds to the initial condition of our Universe. 

In this Letter, we use the lattice formulation of QCD”, that is, we 
discretize space-time on a four-dimensional lattice with N; and N, 
points in the temporal and spatial directions. The lattice spacing is 
denoted by a, the box size by L= N,a, the temperature by T=(aN,) 1 
and the volume of space-time by V= N2N,a4. 

During the expansion of the early Universe, a QCD transition 
occurred that confined quarks and gluons into hadrons. Our most 
important qualitative knowledge about this transition is that it is an 
analytic crossover"!, thus no cosmological relics are expected. Outside 
the narrow temperature range of the transition we know that the 
Hubble rate and the relationship between temperature and the age of 
the early Universe can be described by a radiation-dominated equation 
of state (EoS). The calculation of the EoS is a challenging task, and 
the determination of the continuum limit at large temperatures is 
particularly difficult. 

In our lattice QCD set-up, we used the staggered fermion 
discretization” with four steps of stout- smearing" . In our simulations, 
we included the two light quarks and the strange quark (‘2 + 1 flavours’) 
and when necessary we also added the charm quark (‘2+1+1 
flavours’). The quark masses are set to their physical values, but we use 
degenerate up and down quark masses and the small effect of isospin 
breaking is included analytically. The continuum limit is taken using 
three, four or five lattice spacings with temporal lattice extensions of 
N,=6, 8, 10, 12 and 16. In addition to dynamical staggered simulations, 


we also used dynamical simulations with 2+ 1 flavours of overlap 
quarks'* down to physical masses. The inclusion of an odd number 
of flavours was a non-trivial task, but this set-up was required for 
the determination of the temperature dependence of the topological 
susceptibility of QCD, \(T), at large temperatures in the several GeV 
region. 

Charm quarks start to contribute to the EoS only above 300 MeV, 
so up to 250 MeV we used just 2+ 1 flavours of dynamical quarks. 
Connecting the 2 + 1 and the 2+ 1+ 1 flavour results at 250 MeV can 
be done smoothly. For large temperatures, the step-scaling method for 
the EoS of ref. 15 was applied. We determined the EoS with complete 
control over all sources of systematics all the way to the GeV scale. 

We used two different methods to set the overall scale in order to 
determine the EoS. One of them took the pion decay constant, the other 
applied the wo scale!®. Thirty-two different analyses (for example, the 
two different scale-setting procedures, different interpolations, keeping 
or omitting the coarsest lattice) entered our histogram method'”'* to 
estimate systematic errors. We also calculated the goodness of the fit 
and weights based on the Akaike information criterion, AICc!8, and 
we looked at the unweighted or weighted results. This provided the 
systematic errors on our findings. In the low-temperature region, 
we compared our results with the prediction of the hadron resonance 
gas (HRG) approximation and found good agreement (within error 
bars). This HRG approach is used to parameterize the EoS for small 
temperatures. In addition, we used the hard thermal loop approach” 
to extend the EoS to high temperatures. 

In order to have a complete description of the thermal evolution of 
the early Universe, we supplement our QCD calculation for the EoS 
by including the rest of the standard model particles (leptons, bottom 
and top quarks, the photon, W, Z, Higgs bosons) and results on the 
electroweak transition. As a consequence, the final result on the EoS 
covers four orders of magnitude in temperature, from MeV to several 
hundred GeV. Figure 1 shows the EoS. The widths of the lines represent 
the uncertainties. Both the figure and the data can be used (similarly to 
figure 22.3 of ref. 20) to describe the Hubble rate and the relationship 
between temperature and the age of the Universe in a very broad 
temperature range. 

We now turn to the determination of the other key quantity, \(7). 
In general, the action of QCD should have a term proportional to 
the topological charge of the gluon field, Q. This term violates the 
combined charge-conjugation and parity (CP) symmetry. The 
surprising experimental observation is that the proportionality factor 
of this term @ is unnaturally small—this is known as the strong 
CP problem. A particularly attractive solution to this fundamental 
problem is the so-called Peccei-Quinn mechanism!. An additional 
scalar U(1) symmetric field is introduced. The underlying Peccei- 
Quinn U(1) symmetry is spontaneously broken—which can happen 
pre-inflation or post-inflation—and an axion field A acts as a massless 
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Figure 1 | The effective degrees of freedom of the energy density, the 
entropy density and the heat capacity in the early Universe, and their 
ratios. Shown as a function of temperature T in the lower panel are the 


degrees of freedom as follows: g, for the energy density p= seT} g; for 


the entropy density s= eo 73), and g, for the heat capacity [c = sce} 
Neglecting the cosmological constant, the time dependence of the 
temperature in the early Universe is given by these factors as: 

aT 2nd/2 73 [88s 
dt 3/5 Mp g, 
chosen to be the same as our error bars (s.e.m.) at the vicinity of the QCD 
transition (T'~ 150 MeV), where we have the largest uncertainties. At 
temperatures T < 1 MeV the equilibrium EoS becomes irrelevant for 
cosmology, because of neutrino decoupling. The EoS comes from our 
calculation up to T= 100 GeV. At higher temperatures the electroweak 
transition becomes relevant and we use the results of ref. 22. Note that for 
temperatures around the QCD transition, non-perturbative QCD effects 
push the EoS away from the ideal gas limit, an approximation which is 
often used in cosmology; for example, g./g, is reduced from this limit by 
about 35% (see upper panel). Also note that g,/g, has four local minima: 
near the muon threshold, the QCD transition, the W, Z-boson thresholds 
and the electroweak transition. For parameterizations for the QCD regime 
or for the whole temperature range see Supplementary Information. 


, where Mp is the Planck mass. The line width is 


Goldstone boson of the broken symmetry”. The symmetry-breaking 
scale f, is a free parameter. Owing to the chiral anomaly, the axion is 
coupled to the topological charge density, so the original potential of the 
axion field with its U(1) symmetry breaking gets tilted and has its 
minimum where (6+ A/f4) =0. This sets the proportionality factor of 
Q in the QCD action to zero and solves the strong CP problem. 
Furthermore, the axion acquires a mass ma, which is given by 
m, = x/f7»and x= (Q)/V is the susceptibility of the topological charge 
normalized by the space-time volume. We determined its value at T=0, 
which was y(T=0) =0.0245(24)(12) fm~* in the isospin symmetric 
case; the first error in parentheses is statistical, the second is systematic. 
Isospin breaking results in a small, 12% correction, thus the physical 
value is (IT =0) =0.0216(21)(11) fm~*= [75.6(1.8)(0.9) MeV]? 

On the lattice, y can be conveniently calculated using a Q defined 
along the Wilson flow’. An earlier study” looked at x(T) in the 
quenched approximation. A result was provided within the quenched 
framework, and reached a temperature about one-half to one-third 
of the necessary temperatures for axion cosmology (a similar study 
with somewhat less control over the systematics is in ref. 4). To obtain 
a complete result, dynamical quarks with physical masses should be 
used. Dynamical configuration production is, however, about three 
orders of magnitude more expensive computationally, and the ,(T) 
values are several orders of magnitude smaller than in the quenched 
case. Owing to cut-off effects, the continuum limit is far more difficult 
to carry out in dynamical QCD than in the pure gauge theory”. All in 
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Figure 2 | Continuum limit of x(T). Main panel, plot of the temperature 


dependence of the topological susceptibility of QCD, (T). The width of 
the line represents the combined statistical and systematic errors (s.e.m.). 
The dilute instanton gas approximation (DIGA) predicts a power behaviour 
of T~> with b = 8.16, and the lattice result shown here is close to this value. 
Inset, magnified view of the behaviour around the QCD transition. 


all, we estimate that the brute-force approach to providing a complete 
result on \(T) in the relevant temperature region would be at least ten 
orders of magnitude more expensive than the result of ref. 5. 

The huge computational demand and the physics issue behind the 
determination of \(T) have two main sources: (a) in high-temperature 
lattice QCD, the most widely used actions are based on staggered 
quarks, and when dealing with topological observables staggered 
quarks have very large cut-off effects; and (b) the tiny topological 
susceptibility needs extremely long simulation threads to observe 
enough changes of the topological sectors. 

We solve both problems and determine the continuum result for 
x(T) for the entire temperature range of interest. We call our proposed 
solution of problem (a) the ‘eigenvalue reweighting’ technique. The 
method is based on substituting the topology related eigenvalues of the 
staggered quark operator with the eigenvalues of the quark operator 
in the continuum. To solve problem (b), we propose to measure the 
logarithmic differential of the susceptibility instead of the susceptibility 
itself, which is related to quantities that are to be measured in fixed 
topological sectors. The final result is obtained with an integral, so we 
call our method the ‘fixed sector integral’ technique. Both techniques 
are explained in detail in the Methods. 

The CPU costs of the conventional technique scale as T*, whereas the 
new ‘fixed sector integral’ method scales as T°, so the reduction in CPU 
time is tremendous. This efficient technique is used to obtain the final 
result for y(T). Since we work with continuum extrapolated quantities 
both for the ratios in the starting-point and for their changes, we could 
in principle use any action in the procedure: here we will use overlap 
and/or staggered actions. 

By combining these methods, \(T) can be determined (see Fig. 2). 
The cut-off effects of staggered fermions (several thousand per cent) are 
removed, leaving a very mild (of the order of 10%) continuum extrap- 
olation to be performed. In addition, the direct determination of \(T) 
all the way up to 3 GeV means that we do not have to rely on the dilute 
instanton gas approximation (DIGA). Note that a posteriori the expo- 
nent predicted by DIGA turned out to be compatible with our finding, 
but its prefactor is off by an order of magnitude, similar to the quenched 
case. Though some of our simulations (see Supplementary Fig. 18) 
are already carried out with chiral (overlap) fermions, where large 
cut-off effects are a priori absent, it is an important task for the future to 
crosscheck these results with a calculation using chiral fermions only. 

As a possible application for these two cosmologically relevant 
lattice QCD results, we show how to calculate the amount of axionic 
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Figure 3 | The relation between the axion’s mass m, and the initial angle 
@ in the pre-inflation scenario and the axion’s mass range in the post- 
inflation scenario. For the pre-inflation scenario, our result is shown by 
the blue line; the error (s.e.m.) is smaller than the line width. The post- 
inflation scenario corresponds to 6) = 2.155 with a strict lower bound on 
the axion’s mass of m,4 = 28(2) eV. The thick red line shows our result for 
the axion’s mass for the post-inflation case: for example, m4 =50(4) peV if 
one assumes that axions from the misalignment mechanism contribute 50% 
to dark matter. Our final estimate is 50 peV < my, < 1,500 eV (the upper 
bound assumes that only 1% is the contribution from the misalignment 
mechanism, the rest comes from other sources—for example, topological 
defects). An experimental set-up to detect post-inflationary axions is given 
in Supplementary Information. The slight bend around m4 ~ 10° eV 
corresponds to an oscillation temperature at the QCD transition”>*. 


104 


dark matter and how it can be used to determine the axion’s mass. 
X(T) is a rapidly decreasing function of the temperature. Thus, at high 
temperature mz, (which is proportional to x(T)"”) is small: in fact, it 
is much smaller than the Hubble expansion rate of the Universe at 
that time or temperature (H(T)). The axion does not yet ‘feel’ the tilt 
in the Peccei-Quinn ‘Mexican hat’-type potential, and it is effectively 
massless and frozen by the Hubble friction (a linear, friction-like term 
proportional to the Hubble constant in the equation of motion, see 
the second term of equation $29 in Supplementary Information). As 
the Universe expands, the temperature decreases, \(T) increases and 
the axion mass also increases; meanwhile, the Hubble expansion rate 
(given by our EoS) decreases. As the temperature decreases to Tosc, the 
axion mass is of the same order as the Hubble constant (the oscillation 
temperature T,,, is defined by 3H(Tys-) = M4(Tosc)). Around this time 
the axion field rolls down the potential, and starts to oscillate around 
the tilted minimum; the axion number density increases to a non-zero 
value, thus axions as dark matter are produced. The details of this 
production mechanism, usually called misalignment, are quite well 
known (see, for example, ref. 9). 

In a post-inflationary scenario, the initial value of the angular 
degree of freedom (6) of the axion field takes all values between 
—mt and 1, whereas in the pre-inflationary scenario only one angle, 
, contributes (all other values are inflated away). One should also 
mention that during the U(1) symmetry-breaking, topological strings 
appear, which decay and also produce dark-matter axions. In the pre- 
inflationary scenario they are inflated away. However, in the post- 
inflationary framework their role is more important. This sort of axion 
production mechanism is less well-understood, and in our final results 
it is necessary to make some assumptions about the amount of axions 
produced by topological strings (see Fig. 3 legend). 

The direct consequence of our results on y and the EoS is the mass 
of the dark-matter candidate, the axion. For the pre-inflationary 
Peccei-Quinn symmetry-breaking scenario, the initial value of the 
axion field of our Universe (0)) determines the axion mass (and vice 
versa). In Fig. 3 we show this relationship between the axion’s mass 
ma (and the symmetry-breaking scale f4) and the initial angle 0. For 
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the post-inflationary Peccei-Quinn scenario, the horizontal thick red 
interval shows the possible range on ma. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


Eigenvalue reweighting technique. Here we show how cut-off effects in y arise 
with staggered quarks and propose a new method to efficiently suppress them. 

The cut-off effects are strongly related to the zero modes. To understand their 
importance, we first note that in the quark determinant every zero mode for each 
dynamical flavour contributes a factor mg the corresponding quark mass. In this 
way gauge configurations with zero modes are strongly suppressed in the path 
integral, especially if the quark masses are small. Owing to the index theorem, 
this also implies that light dynamical quarks strongly suppress higher topological 
sectors and thus y itself. 

On the lattice, however, there can be strong cut-off effects in this suppression. 
This is because the suppression factor is not my but my+ io, where Xo is an eigen- 
value corresponding to the would-be zero mode of the staggered Dirac operator, 
Dg. The lack of exact zero modes can thus introduce strong cut-off effects and slow 
convergence to the continuum limit. Indeed, as long as the typical would-be zero 
eigenvalues are comparable to or larger than the lattice bare quark mass mg higher 
sectors are much less suppressed on the lattice than in the continuum. 

To improve the situation, even at finite lattice spacing we can identify the 
would-be zero modes and restore their continuum weight in the path integral. In 
case of rooted staggered quarks this amounts to a reweighting of each configuration 
(U) with a weight factor 


np /4 


l 
(1) 


if I | my 


2| 
WMIATT TL I | atom 


where the second product runs over the would-be zero eigenvalues (A,,) of the 
staggered Dirac operator with positive imaginary part. For nfermion flavours the 
third product takes into account the iA — —iA symmetry of the eigenvalue 
spectrum. The nj/4 factor takes rooting into account, the factor 2 next to |Q| 
together with the + symmetry make up for the fact that in the continuum limit the 
staggered zero modes become fourfold degenerate”. 

We now turn to the most important part of the reweighting: the definition of 
the would-be zero modes. Since we are interested in y, we identify the number of 
these modes with the magnitude of the topological charge 2|Q| as obtained from 
the gauge field after using the Wilson flow (see Supplementary Information). We 
investigated two specific choices for the would-be zero modes. In the first approach, 
we took the 2] Q| eigenmodes that have the largest magnitude of chirality among 
the eigenmodes with the appropriate sign of chirality, positive if Q < 0 and negative 
if Q> 0. In the second approach we took the 2|Q| eigenmodes with smallest 
magnitude. These two approaches are equivalent in the continuum limit, where 
zero modes are exactly at zero and their chirality is unity. In practical simulations 
they give very similar results; we use the second approach in our analysis. 

Since in the continuum limit the would-be zero eigenvalues get closer to zero, 
the reweighting factors tend to unity and in the continuum limit we recover the 
original Dirac operator. This way, however, even at finite lattice spacings the proper 
suppression of higher sectors is restored and cut-off effects are strongly reduced 
resulting in much faster convergence to the continuum limit. For completeness, 
we note that the above modification corresponds to a non-local modification of 
the path integral. (In this respect it stands on a footing similar to another method, 
which also modifies the quark determinant and which we also use in our staggered 
simulations: determinant rooting. As of today there is ample theoretical and 
numerical evidence for the correctness of the staggered rooting. See ref. 26 and its 
follow-ups.) In the following we provide several pieces of numerical evidence for 
the correctness of the approach. 

In Extended Data Fig. 1 we plot the distribution of the eigenvalues correspond- 
ing to the would-be zero modes at a temperature of T= 240 MeV for different 
lattice spacings. The distributions get narrower and their centre moves towards zero 
as the lattice spacing is decreased. In Extended Data Fig. 2 we show the expectation 
value of the reweighting factors in the first few sectors. In the continuum limit 
(w)g=1 should be fulfilled in each sector. The results nicely converge to 1. 

In most of our runs, especially at large temperatures and small quark masses, 
the weights were much smaller than 1. As a result there are orders of magnitude 
differences between \ with and without reweighting. It is therefore important to 
illustrate how the standard approach breaks down if the lattice spacing is large and 
how the correct result is recovered for very small lattice spacings. In the following 
we show two examples, Extended Data Figs 3 and 4, where the standard method 
produces cut-off effects so large that a reliable continuum extrapolation is not 
possible. In contrast, the lattice spacing dependence of the reweighted results is 
much milder. To make sure that the reweighted results are in the a-scaling regime, 
for both cases we present a non-standard approach to determine y and compare 
them to reweighting. 

In the first case (Extended Data Fig. 3) the temperature is just at the transition 
point, T= 150 MeV, where we expect to get a value close to the zero temperature 


susceptibility. This suggests that in this case the cut-off effects of the standard 
method can be largely eliminated by performing the continuum limit of the 
ratio y(T, a)/x(T=0, a), where the finite temperature result is divided by the 
zero temperature one at the same lattice spacing. We call this approach the ‘ratio 
method; see for example, ref. 7. As can be seen in Extended Data Fig. 3, the cut- 
off effects are indeed reduced. The continuum extrapolation so obtained is nicely 
consistent with reweighting. 

In the second case, Extended Data Fig. 4, we have a temperature well above 
the transition, T= 300 MeV. We see again that the standard method produces 
results with large cut-off effects. The ratio method seems to perform better, but 
the apparent scaling is misleading. Although a nice continuum extrapolation can 
be done from lattice spacings N)= 8, 10 and 12, the N, = 16 result is much below 
the extrapolation curve. The reweighting produces a result that is an order of 
magnitude smaller. Below we introduce a new method, called the ‘fixed sector 
integral technique; which is tailored for large temperatures. The result so obtained, 
where no reweighting is applied, agrees reasonably with the reweighted one in the 
continuum limit. 

These results provide numerical evidence for our expectations: reweighting not 

only produces a correct continuum limit, it also eliminates the large cut-off effects 
of staggered fermions. 
Fixed sector integral technique. There are many proposals to increase the tunnel- 
ling between the topological sectors; see, for example, refs 27-29. Here we forbid 
sector changes and determine the relative weight of the sectors by measuring the 
Q dependence of certain observables. (We note that a discussion of our method, 
though only in the quenched approximation using coarse lattices, has appeared 
in ref. 30.) Here we illustrate the method in the quenched approximation; for the 
extension in the case of dynamical fermions, see Supplementary Information. The 
gauge configurations are generated with a probability proportional to exp(—S,), 
where (3 is the gauge coupling parameter and S, is the gauge action. We consider 
the following differentials: 


_ dlog ZQ/Zo _ _ db 
=o dlog T 


d log a (Ss)o-o (2) 


where Zg is the partition function of the system restricted to sector Q. In the con- 
tinuum limit the sectors are unambiguously defined, but on the lattice several 
different definitions are possible (our choice is given later on). In equation (2) we 
introduced the notation (O)go = (O)q — (O)o for the difference of the 
expectation values of an observable between sectors Q and 0. Equation (2) gives a 
renormalized quantity, the ultraviolet divergences cancel in the difference of the 
gauge actions. The important observation is that the necessary statistics to reach 
a certain level of precision on values of bg does not depend on the temperature. 
To obtain the relative weights Za/Zp, we need to integrate equation (2) in the 
temperature. For that, we start from a temperature To, where the standard approach 
is still feasible, and determine Zg/Z. Then by measuring the bg values for higher 
temperatures, we can use the following integral in temperature T’ to obtain Zq/Zo: 


T 
Zolalr =ex9|— d logT’ bolt) 2/2 (3) 
0 


Let us make a remark about the volume dependence. As we increase the 
temperature, the ratios Zg,1/Zg get smaller. This effect is in competition with the 
infinite volume limit, which brings these ratios closer to 1. The question is how 
many sectors are needed to determine y reliably. y is an intensive quantity, and as 
such, its finite volume effects can be neglected, if the box size is large enough to 
accommodate all correlation lengths in the system. In our quenched study”, we 
found that for LT, = 2 the finite size effects on are negligible, where T- denotes 
the phase transition temperature. 

For high temperatures only the Q=0 and 1 sectors remain relevant and 
(Q?) =xV becomes small. Using the data from our quenched simulations®, we 
found that in a box size of L=2/T,, the contribution of Q>2 sectors to x and also 
xV are on the per cent level at 1.7T- and they decrease rapidly with the 
temperature®. (A similar behaviour was found with dynamical fermions, see 
Supplementary Information for details.) In the case when contributions from Z, 
and higher sectors are small it is appropriate to write y = 2Z,/Zo(1+2Z,/Zo)/V. 
To the accuracy of O(, V), one can also use y % 2Z;/Zo/V, and then the decay 
exponent of the susceptibility b can be simply obtained as 


_ _ dlog x 


=-——"“nb,-4 
dlog T 1 (4) 


Here the term —4 reflects that the physical volume also changes with the 
temperature. To derive the Stefan—Boltzmann limit of equation (4), we can use the 
fact that for large temperatures = 33 log a/(41”). The gauge action difference is 
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given by the classical action of one instanton, the solution of the classical theory 
with topological charge 1, (S_); > =47/3. We get b=7 in the Stefan—Boltzmann 
limit (for a~0 there are also some cut-off effects). 

As we have already mentioned, to reach the same level of precision on b; as the 
temperature increases, the statistics can be kept constant. However, with an 
increase in the spatial size N,, the statistics have to be increased as N 2 and asa 
result the computer time goes as N°. This can be understood as follows: the gauge 
action difference between sectors 1 and 0 will be approximately given by the action 
of one instanton, which remains constant with increasing volume. The gauge action 
Sg itself, however, increases with the volume and the cancellation in equation (2) 
gets more severe. This ‘volume squared’ scaling problem can be mitigated by 
putting more and more topological charge into the box with increasing box size. 
If the topological objects are localized and well separated, then for large volumes 
the action difference between sectors 1 and 0 can be obtained from the difference 
between sectors Q and 0: 


(Sgh-o = (Sg)q-0/|Q| (5) 


This relationship can be used to achieve a Q-fold increase in the signal-to-noise 
ratio, which translates into a Q-fold decrease in the necessary computing time. 
Next, we are going to check this relation in our numerical simulations. 
Numerical illustration. We have carried out several numerical simulations to test 
the new approach. Details of the algorithm and of our definition of Q can be found 
in the Supplementary Information. 

At finite lattice spacing, Q is not necessarily an integer, thus there is a certain 
degree of ambiguity in defining the sectors; this ambiguity disappears in the 
continuum limit. First, we looked at simulations, where we constrained Q to be 
larger than 0.5. The parameters can be found in Extended Data Table 1 under 
the label ‘N,-scar. The results can be seen in Extended Data Fig. 5, where the charge 
distributions for four different lattice spacings are plotted. Since the temperature 
was high, the system did not explore configurations with Q > 1. The non-zero 
width of the distributions is a lattice artefact. We can clearly observe that the peaks 
get sharper towards the continuum. Also, the centre of the peak gets closer to 1. 
This is expected, since our definition of Q, which is evaluated along the Wilson 
flow, is renormalized. These centre values are also given in the plot, and are denoted 
by z. We found that z is compatible with a1 + c/N? behaviour. Thus these peaks 
at finite lattice spacing correspond to the Q= 1 sector in the continuum. The z 
factors can be used as an O(1/N;) correction to move the peaks of the Q distribu- 
tion closer to integer values. This correction is optional for the standard evaluation 
of x, but becomes useful for identifying higher Q sectors especially on coarse lat- 
tices. Inclusion of this z factor corresponds to a O(a’) improvement in renormal- 
ized quantities. 

Making the peaks sharper can also be achieved by using improved gauge 
actions, like the tree-level Symanzik, Iwasaki or DBW2 actions. They suppress the 
topological dislocations and produce fewer tunnelling events. For a comparison 
of the topological properties of these gauge actions, see ref. 31. It can also be useful 
to improve the definition of Q along the lines presented in ref. 32, which pushes 
Q towards integer values. 

To explore sectors with higher Q, we defined the boundaries of the intervals as 
(Q = +) x z. We found sharp peaks in the distribution of Q; see, for example, 


Extended Data Fig. 6, where Q-histograms corresponding to the ‘Q-scar’ simula- 
tions are shown. For the parameters, see Extended Data Table 1. The peaks are 
centred approximately around Q x z, using the z factor found in the Q=1 
simulations. In these runs we went up to Q=8. As can be seen, with increasing Q 
the charge distributions get broader. We also observe that changing the volume at 
this particular temperature does not have a large effect on the distribution. We note 
that the relative weights between the histograms are not included in the plot. These 
can be determined using the fixed sector integral technique. 

It can also happen that a simulation gets trapped on the predefined sector 
boundary with a small acceptance ratio. This can be interpreted as a topological 
dislocation that is trying to disappear in each update, but is not allowed to 
disappear due to the Metropolis step. In the presented simulations, this happened 
in one out of 16 simulation streams, on a 8 x 64? lattice at (G=7.30 in sector Q=8. 
In the gauge action difference the result was consistent with the untrapped streams, 
thus the inclusion or non-inclusion of this stream did not change the value of bs. 
Nevertheless, we discarded this stream from our final analysis, because owing to the 
small acceptance it had a large autocorrelation time and was obviously non-ergodic. 

An important issue in fixed topology simulations is ergodicity. We used 
16 streams: the starting configurations were picked from a low temperature 
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simulation where topology decorrelated on a timescale of few updates. Therefore 
the streams can be regarded as independent. After a few thousands of updates, 
the gauge action was consistent among the different streams. As an example, 
in Extended Data Fig. 7 we show the results of the Q-scan runs (see Extended 
Data Table 1). Plotted is bg from equation (2). The odd-Q sectors are not shown 
for clarity. The results obtained from different streams are all consistent with 
one another. 

It is also interesting to investigate the Q dependence of bg. As we mentioned 
above, a naive expectation is that sector Q contains Q localized objects, each of them 
independently contributing b; to the total ba, thus bg =|Q| x b; (see equation (5)). 
With increasing volume the corrections to this equation should get smaller, owing 
to the cluster decomposition principle. We found that even on 8 x 16° lattices up to 
Q=8 the gauge action differences are consistent with a linear increase with Q. The 
lines in Extended Data Fig. 7 represent the fit to all streams and charges assuming 
equation (5). A good fit quality can be obtained. On the basis of this finding, we 
used the 8 — 0 difference in a large volume run 8 x 64%, for which measuring the 
1—0 difference would have been much more expensive. 

We now investigate the cut-off and finite size effects in bj at a temperature of 
T*6T.. As we have already discussed, at such a high temperature the contributions 
from bg >» can be safely neglected when calculating the full susceptibility, and the 
decay exponent is given by b= b, — 4. The upper plot in Extended Data Fig. 8 shows 
basa function of the lattice spacing squared in a fixed physical volume, whereas 
the middle panel shows it as a function of the aspect ratio N,/N,. The parameters 
of the runs are listed in Extended Data Table 1 under the labels N,-scan and 
N,-scan. Starting from aspect ratio ~3, we see no significant finite size effects. We 
note that starting from aspect ratio 6, the boxes are large enough to accommodate 
non-perturbative length scales, that is, LT. > 1. We see no difference between boxes 
with perturbative and non-perturbative size. 

In the second set of simulations, we investigated the temperature dependence 
of the decay exponent. On the basis of the above results we took a lattice size of 
8 x 32°; the parameters can be found in Extended Data Table 1 under the label 
‘temperature scan. The upper panel of Extended Data Fig. 9 shows the results for 
b=b, — 4. Again we find agreement between the new data and the direct approach. 
At one temperature we did a simulation on an 8 x 64° lattice, where the exponent 
was obtained from measuring the difference between the Q=8 and 0 sectors, 
b=bg/8 — 4. We see no significant finite size effect. 

To get the Z;/Zp ratio, we performed a direct simulation at a temperature of 
To=1.2T.. From this temperature we used a trapezoidal integration of b; to obtain 
the Z;/Zo ratio as the function of temperature, up to 7T.. In the lower panel of 
Extended Data Fig. 9 we plot y = 2Z;/Zo/(1+2Z;/Zo)/V, normalized by T.! which 
can be compared to the lattice result obtained from the direct method*. As we 
already discussed, starting from a temperature of 1.7T- the contribution of Q>2 
can be neglected to obtain the susceptibility. We find a good agreement both for 
the exponent and the susceptibility itself. 

Extended Data Fig. 9 was made using 30 million 8 x 323 and 1 million 8 x 64° 
update sweeps. The cost of a simulation at T=7T- using the standard method can 
be estimated from ref. 5: it would require about 250 million updates on 8 x 64° 
lattices or about 2 billion on 8 x 323, two orders of magnitude more, than with 
the novel method. 

Code availability. A CPU code for configuration production can be obtained from 
Z.F. on request. The Wilson flow evolution code, which was used to determine Q, 
can be downloaded from https://arxiv.org/abs/1203.4469 
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Extended Data Figure 1 | The probability distribution of the is obtained using the chirality method described in the text. The different 
eigenvalues corresponding to the would-be zero modes. The plot shows colours refer to different lattice spacings; see key at right. N, is the 
the results of dynamical lattice QCD calculations with n=2+1+1 number of lattice points in the temporal direction. Note that it is inversely 
flavours of staggered quarks at a temperature of T= 240 MeV. The result proportional to the lattice spacing: a= 1/(TN,). 
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Extended Data Figure 2 | Expectation value of the weight factors in different topological sectors, (w)g, as a function of the lattice spacing squared. 


The plot shows lattice results with n= 3+ 1 flavours of staggered quarks at a temperature of T= 300 MeV. For clarity we plot the results as a function of 
10/N-’. Different colours correspond to different topological charge (Q) sectors. Error bars, s.e.m. 
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Extended Data Figure 3 | The lattice spacing dependence of the 
topological susceptibility ~ obtained from three different methods 


described in the main text, namely standard, ratio and reweighting. 
For the reweighting method, a continuum extrapolation is also shown. 


The plot shows lattice results with ns=2-+ 1+ 1 flavours of staggered 
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quarks at a temperature of T= 150 MeV. At this relatively low temperature 
the standard (‘brute force’) method still cannot provide three lattice 
spacings, which extrapolate to the proper continuum limit, though they 
correspond to very fine lattices with N,= 12, 16 and 20. Error bars are 
s.e.m. and smaller than the symbols. 
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Extended Data Figure 4 | Lattice spacing dependence of the topological at a temperature of T= 300 MeV. For the ratio method, a misleading 


susceptibility obtained from four different methods described in continuum extrapolation using N; = 8, 10 and 12 is shown with a dashed 
the text, namely, standard, ratio, reweighting and integral. The plot line. For the reweighting and integral methods, continuum extrapolations 
shows lattice results with ny=2-+ 1+ 1 flavours of staggered quarks are shown with bands. Error bars, s.e.m. 
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Extended Data Figure 5 | Histograms of the topological charge (Q) for different lattice spacings in simulations under the constraint Q > 0.5. 
The relative fraction of configurations in each bin is plotted. The centre of the peaks, denoted by z, is also given. The plot shows pure gauge theory 
simulations at T~ 6T.. 
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Extended Data Figure 6 | Histograms of the topological charge determined from the fixed sector integral technique. The plot shows pure 
from fixed sector simulations for Q= 0-8. The relative fraction of gauge theory simulations at a temperature of T=5T.. Colour coding: red, 
configurations in each bin is plotted. The sector boundaries are defined 8 x 16° lattice with fixed topological charges Q = 0-8; green, 8 x 32? lattice 
using a z factor, as described in the text. Note that the relative weights with fixed Q=1; blue, 8 x 64? lattice with fixed Q=8. 


between the histograms are not included in the plot. These can be 
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Extended Data Figure 7 | Gauge action difference. The plot shows ba difference scales linearly with the topological charge, see equation (5). 
as defined in equation (2) from pure gauge theory simulations on 8 x 16° The horizontal lines correspond to this fit at the given Q values and the 
lattices at temperature T=5T.. The different points correspond reduced x” is shown at top right. Error bars are s.e.m. and smaller than the 
to independent simulation streams and different topological sectors. symbols. 


A good fit can be obtained assuming ergodicity and that the action 
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Extended Data Figure 8 | Lattice-spacing and finite-volume dependence _ temperature T=6T.. The lines are obtained from a joint fit, which takes 


of the decay exponent of the topological susceptibility, b. The decay into account both finite spacing and finite size effects. For the exponent 
exponent b as a function of the lattice spacing squared shows the we obtain b =7.1(3) in the continuum and infinite volume limit at this 
continuum extrapolation (upper panel) whereas b as a function of the particular temperature. This is in good agreement with our previous 
linear extent of the lattice represents the infinite volume extrapolation estimate from the direct method®, where we obtained b =7.1(4). 

(lower panel). The plots shows pure gauge theory simulations at Error bars, s.e.m. 
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Extended Data Figure 9 | Topological susceptibility in the pure gauge 
theory. The results shown are from an earlier direct simulation® and from 
the novel fixed sector integral technique. Upper panel, the temperature 
dependence of decay exponent b; lower panel, temperature dependence of 
the topological susceptibility itself. Filled red and pink circles show the 
lattice results for the decay exponent on 8 x 32? and 8 x 64? lattices, 
respectively. The open circles show the topological susceptibility from the 
fixed sector integral technique. The green band refers to the result of ref. 5, 
the key shows the abbreviated arXiv preprint number. The black arrow 
indicates the Stefan—Boltzmann limit. We also show the result from the 
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DIGA with blue bands, see, for example, ref. 33. To convert the result into 
units of T. we used T./ Ajs=1.26 from ref. 15. The width of the DIGA 
prediction reflects the change of the renormalization scale from 1/2 nT to 
2 xT. For the exponent b we see a good agreement for temperatures above 
~4T., for smaller temperatures the lattice tends to give smaller values than 
the DIGA. In the case of the susceptibility, the DIGA underestimates the 
lattice result by about an order of magnitude, this has already been 
observed in ref. 5. The ratio at T= 2.4T, is K= 11.1(2.6), where the error 
is dominated by the lattice calculation. Error bars, s.e.m. 
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Extended Data Table 1 | Parameters of fixed sector simulations in the pure gauge theory 


re T., the third is the lattice extent, the fourth is the collected statistics in millions of 


he last column gives the acceptance rates in the Q>0 sectors. The first block of parameters shows an Ns scan for which we fixed the gauge coupling 


and the temperature and varied the spatial extent of the lattice. The second block contains an N; scan corresponding to a continuum extrapolation. Here the physical volume was fixed. In the Q scan we 


to Q=8. The last block corresponds to a temperature scan for which we fixed temporal lattice extent and changed the temperature from 1.27, up to 


h temperatures the probability of exploring non-trivial topologies is very small. In the Q=1 


sector, the acceptance was about 70% on the coarsest lattice. For this we had to switch off the overrelaxation step, which makes large moves in the configuration space, and would have almost always 


B T/Teo | Ns Nt | Mupdates Q | acc. Q#0 
Ngs-scan 
6.90 6.2 12x4 1.3 0,1 73% 
16x4 1.7 0,1 73% 
24x4 4.3 0,1 73% 
32 x4 5.8 0,1 73% 
40 x4 24 0,1 73% 
48 x4 28 0,1 73% 
Nt-scan 
6.90 6.2 8x4 0.3 0,1 72% 
7.23 6.1 12x6 1.3 0,1 92% 
7.46 6.0 16x 8 2.9 0,1 92% 
7.65 6.0 20 x 10 4.1 0,1 98% 
Q-scan 
7.30 5.0 16x 8 0.7 0 - 
0.7 1 92% 
0.7 2 88% 
0.7 3 85% 
2.6 4 83% 
2.4 5 81% 
2.4 6 78% 
2.3 7 76% 
2.3 8 73% 
temperature scan 
6.20 1.2 32 x 8 3.7 0,1 88% 
6.35 1.5 32 «8 3.7 0,1 93% 
6.50 1.9 32 «x 8 3.7 0,1 94% 
6.70 2.4 32 «x 8 3.7 0,1 94% 
6.90 3.1 32 «x 8 3.7 0,1 94% 
7.10 3.9 32 * 8 3.7 0,1 94% 
7.30 5.0 32x 8 3.7 0,1 94% 
7.30 5.0 64x 8 1.2 0,8 64% 
7.60 7.0 32 «x 8 3.7 0,1 94% 
The first column gives the gauge coupling, the second is the temperature in units of the transition temperatu 
updates, the fifth shows the Q sectors used and 
studied all topological charge sectors from Q=0 
7T.. In the trivial sector we always achieved an acceptance of 100%, which simply reflects the fact that at hig! 
resulted in a topology change. On finer lattices, t 


e acceptance was around 90% or better even in the presence of overrelaxation. In the Q> 1 sectors, the acceptance gradually decreased as charge was 


increased; a simple explanation of this is that the disappearance probability of multiple instantons is approximately the sum of the individual disappearance probabilities. The worst acceptance was 
around 65% on an 8 x 64° lattice in sector Q=8. 
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Emergence of a turbulent cascade in a quantum gas 


Nir Navon!, Alexander L. Gaunt!, Robert P. Smith! & Zoran Hadzibabic! 


A central concept in the modern understanding of turbulence is 
the existence of cascades of excitations from large to small length 
scales, or vice versa. This concept was introduced in 1941 by 
Kolmogorov and Obukhov?”, and such cascades have since been 
observed in various systems, including interplanetary plasmas’, 
supernovae’, ocean waves? and financial markets®. Despite much 
progress, a quantitative understanding of turbulence remains a 
challenge, owing to the interplay between many length scales that 
makes theoretical simulations of realistic experimental conditions 
difficult. Here we observe the emergence of a turbulent cascade 
in a weakly interacting homogeneous Bose gas—a quantum fluid 
that can be theoretically described on all relevant length scales. 
We prepare a Bose-Einstein condensate in an optical box’, drive 
it out of equilibrium with an oscillating force that pumps energy 
into the system at the largest length scale, study its nonlinear 
response to the periodic drive, and observe a gradual development 
of a cascade characterized by an isotropic power-law distribution 
in momentum space. We numerically model our experiments using 
the Gross-Pitaevskii equation and find excellent agreement with 
the measurements. Our experiments establish the uniform Bose 
gas as a promising new medium for investigating many aspects 
of turbulence, including the interplay between vortex and wave 
turbulence, and the relative importance of quantum and classical 
effects. 

Compared to classical fluids, superfluids present fascinating peculiar- 
ities such as irrotational and frictionless flow, which raises fundamental 
questions about the character of turbulent cascades**. Numerous 
experiments on quantum-fluid turbulence have been performed with 
liquid helium, exploring both vortex®!-!? and wave turbulence!*-!, 
but their theoretical understanding is hampered by the strong interac- 
tions that make first-principles descriptions intractable. The situation 
is a priori simpler for an ultracold, weakly interacting Bose gas, which is 
often accurately described by the Gross—Pitaevskii equation (GPE) for 
the complex-valued matter field w(r, t) (where r= (x, y, z) is the spatial 
position and t is time; ref. 16). This equation is widely used to model 
turbulence in quantum fluids!”-?!, but the numerical results have been 
lacking experimental validation. Experimentally, qualitative evidence 
for turbulence has been seen in quantum gases”*~”», but quantitative 
comparisons with theory were hindered by the inhomogeneous density 
that results from harmonic trapping. Here we eliminate this problem 
by studying turbulence in a homogeneous quantum gas. 

The basic idea of our experiment is outlined in Fig. 1. We prepare a 
quasi-pure Bose-Einstein condensate (BEC) of ®’Rb atoms ina cylin- 
drical optical box’, and drive it out of equilibrium with a spatially uni- 
form, oscillating force that primarily couples to the lowest, dipole-like 
axial mode. Our box has length L=27(1) um and radius R= 16(1) 1m 
(here and elsewhere, errors represent 1o uncertainties). For our typical 
atom number Nx 10°, the initial, equilibrium BEC has a chemical 
potential /kg 2 nK (where kg is the Boltzmann constant), interaction 
energy per particle Ejnt/kp + 1 nK and negligible kinetic energy, while 
the critical temperature for Bose-Einstein condensation is T--~50nK. 
The driving force is provided by a magnetic field gradient that creates 
a potential U(r) = AUz/L, where the coordinate z is along the axis of 


the box (Fig. 1a). The natural scale for AU, separating weak and strong 
drives, is set by pu. 

Numerical simulations in Fig. 1a show the microscopic behaviour 
of a shaken trapped gas, which gradually changes from simple uni- 
directional sloshing along z to an omnidirectional turbulent flow; in 
addition to the wave-like motion, we observe vortex lines (depicted in 
red), which are detected by computing the local circulation. (Snapshots 
of the turbulent flow do not obey the cylindrical symmetry of the 
(time-dependent) Hamiltonian. In real physical systems, any such 
symmetry is always broken by imperfections; in our simulations the 
symmetry breaking is provided by the position of the numerical grid.) 
Here the shaking amplitude is AU/s= 1 and the longest shaking time 
t,=2s corresponds to 16 driving periods. 

Experimentally, we probe the global properties of the gas by releasing 
it from the trap and imaging it along a radial direction (x) after a long 
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Figure 1 | From unidirectional sloshing to isotropic turbulence. 

a, Gross—Pitaevskii simulations of a shaken, box-trapped Bose gas. The 
blue shading indicates the gas density; the red lines indicate vortices. 
b-d, Experimental absorption images taken along x after 100 ms of TOF 
expansion, with N= 8 x 107 atoms (upper panels), and the corresponding 
angular distributions p(@), averaged over 20 images taken under identical 
conditions (lower panels). b, Initial BEC; ¢, after shaking for 2s at 8 Hz 
with amplitude AU/ji ~ 1.2; and d, after the turbulent cloud was allowed 
to relax for 1.5s. The dashed circle in ¢ corresponds to an expansion 
energy of kgT,/2. In the lower panels, the red lines correspond to the 
diamond-like and isotropic distributions depicted in the insets. 
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Figure 2 | Nonlinear spectroscopy of the lowest axial mode of the BEC 
and the route to turbulence. a, b, Small-amplitude oscillations of the centre 
of mass (along the z axis, see Fig. 1a) for N= 10(1) x 10*. a, Free oscillation 
after a 20-ms kick with AU/kg = 1.7 nK. b, Fourier spectrum of the free 
oscillation in a (orange line), and displacement after a whole number of 
driven oscillations, with driving amplitude AU/kg = 0.3 nK and t,=2:s (blue 
points). The solid blue line is a fit (see Methods). Error bars ina and b 
represent lo errors. c, Small-amplitude w,.; versus N for free (orange) and 
driven (blue) oscillations with the same AU and f, values as in a and b. 


time-of-flight (TOF) expansion, trop > 100 ms. From the images we 
extract the centre of mass and momentum distribution of the cloud. 
The position of the centre of mass reflects the axial in-trap sloshing, 
and the evolution of the momentum distribution reveals the cascade 
of excitations from small to large wavevectors k—the so-called direct 
cascade. 

In Fig. 1b-d we show a qualitative experimental signature of turbu- 
lence with three key examples of TOF images (upper panels) and the 
corresponding angular distributions of atoms p(@) (lower panels). The 
initial BEC (Fig. 1b) shows an anisotropic expansion, which is driven 
by the conversion of interaction into kinetic energy, and reflects the 
shape of the container”®. In sharp contrast, after sufficiently long shak- 
ing the expansion is isotropic (Fig. 1c), with p(@) exhibiting small fluc- 
tuations around 1/(27). This is the first qualitative signature of a 
kinetic-energy-dominated turbulent state, in which the long-range 
coherence of the BEC is destroyed. We stress that this highly non- 
equilibrium state is fundamentally different from an equilibrium 
non-condensed state, which is also kinetic-energy-dominated and 
displays isotropic expansion. The key point is that, in our box trap, 
there is a large separation between the initial Ej, and kgT-. This gives 
us access to the regime in which the total (mostly kinetic) energy per 
particle E satisfies Ein. <E<kgT.. In this regime, coherence is 
destroyed in the turbulent state, but the corresponding equilibrium 
state with the same E is still deeply condensed. In Fig. 1c, the dashed 
circle corresponds to an expansion energy of kgT./2, and the average 
energy of the atoms is clearly much lower; from the second moment 
of the TOF distribution we get E+ 0.12kgT., which in equilibrium 
would correspond to a condensed fraction 71 ~ 0.7 (ref. 27). Indeed, if 
we stop shaking and allow the turbulent gas to relax, a BEC reforms 
(Fig. 1d) with the expected 7 =0.7(1). For all our studies of the turbu- 
lent state we restrict the shaking amplitude (AUX 2) and time 


Horizontal error bars represent 1o errors; the wes errors are smaller 

than the point size. The grey shaded area shows numerical solutions of the 
Bogoliubov equations and the red star is the analytical non-interacting 
limit. Green lines are based on hydrodynamic approximations (see text). 

d, e, Nonlinear response, for N= 8(1) x 10¢. d, Driven-oscillation signals as 
in b, for various AU. e, wyes and linewidth I’versus AU (blue points), and 
the corresponding results of GPE simulations (red bands). Inset, TOF 
anisotropy A versus AU/, for t;=4: of resonant driving. All measurements 
of the centre of mass were done with trop = 140 ms. 


(t, < 4s) so that E <0.25kgT,, which corresponds to an equilibrium 
7 of >0.5. 

To see how pumping energy at the largest length scale (with a spa- 
tially uniform force) leads to a turbulent cascade, we perform detailed 
spectroscopy of the lowest-lying axial mode of the BEC (see Fig. 2). 
In contrast to the harmonic trap, for which the dipole mode is fixed 
by the trapping frequency (Kohn’s theorem), in the box it depends on 
interactions, which results in nonlinear behaviour for non-vanishing 
shaking amplitudes. 

We first study the small-amplitude centre-of-mass response for var- 
ious N, using both free and driven oscillations. In the first method, we 
pulse on a constant AU for a short time and let the gas oscillate freely 
in the trap for a variable hold time before releasing it and measuring 
the centre of mass in TOF (Fig. 2a). In the Fourier spectrum of this 
oscillation (orange line in Fig. 2b) we see a single strong peak, with no 
indication that the gradient kick directly couples to other low-lying 
modes. In the second method, we apply a continuous oscillating drive 
of (angular) frequency w and amplitude AUS 0.5, and perform TOF 
measurements after a whole number of drive periods. Similarly to a 
driven harmonic oscillator, the displacement of the centre of mass 
has a dispersive line shape as a function of w, vanishing on resonance 
(see Methods). As shown in Fig. 2b, the two methods give the same 
resonant frequency Wes. 

In Fig. 2c we plot the small-amplitude wy; versus N, and compare 
the data with various theories. The hydrodynamic prediction (solid 
green line) is wp = Wc/L, where c = fu /m is the speed of sound and 
m is the atom mass. This theory assumes that the healing length 
&=h/./2mp (where his the reduced Planck constant) satisfies € < L. 


It is therefore not applicable in the N — 0 limit, where we; = 377h/ 
(2mL?) (red star in Fig. 2c) is given by the splitting of the lowest axial 
single-particle states. For our largest BEC, L/£€ = 20, but wyp is still 
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Figure 3 | Development of a turbulent cascade. a, Momentum 
distribution of the turbulent gas (solid black line), for N=7(1) x 104, 
AU/p=1.1(1), t-=45, w/(27) = 9 Hz and tror = 100 ms. The vertical red 
lines indicate the momentum resolution kjoy (left) and the energy sink at 
Knigh (right); the dashed blue line is a guide to the eye, offset from the data 
for clarity. Lower inset, compensated spectrum k)*‘fi(k,) with 7 = 3.5 (in 
log-log scale); k,, and ky define the fitting ranges used in b-d. Upper 
inset, steady-state distribution from GPE simulations, for AU/j= 1. 

b, Dynamics of fi(k,) towards the steady state, for AU/j. = 1.1(1). Inset, 


observably lower than wyes. We empirically find that an upper bound 
ON Wes (dashed green line in Fig. 2c) is obtained by calculating wyp for 
an effective BEC volume that excludes the region within € of the trap 
walls. Finally, we linearize the GPE around the ground-state BEC solu- 
tion for our box trap and numerically solve the resultant Bogoliubov 
equations (see Methods). These solutions are shown as the grey shaded 
area in Fig. 2c, which accounts for the experimental uncertainty in the 
box size. We find excellent agreement with the data, without any adjust- 
able parameters. 

In Fig. 2d, e we show measurements for driven oscillations with 
different drive strengths. Increasing AU shifts and broadens the res- 
onance, and both trends are reproduced by our GPE simulations (red 
bands in Fig. 2e); for very large AU the classical-field GPE approxi- 
mation may gradually break down. The line broadening, seen for any 
non-zero AU, indicates nonlinear coupling to other modes, which 
provides the route for the transfer of excitations into other directions 
and a direct cascade. 

In the inset of Fig. 2e we plot the anisotropy of the TOF expansion 
A= (1/2) J |p) — 1/(2n)|d6 (see Methods) for 4s of resonant driving. 
For AUZ0.8y we observe the isotropic expansion (A +0) that quali- 
tatively signals turbulence. A key quantitative expectation for an iso- 
tropic turbulent cascade is the emergence of a steady-state power-law 
momentum distribution: 1(k) oc k~7, where 7 is a constant”®. Owing to 
the line-of-sight integration in absorption imaging, this corresponds 
to an in-plane distribution fi(k) « k~O-», 

In Fig. 3 we present our study of 7i(k) observed after a resonant drive. 
An isotropic expansion (from an anisotropic container) necessarily 
means that the in-trap kinetic energy dominates over the interaction 
energy, which in turn means that the TOF expansion can provide an 
accurate measure of the in-trap momentum distribution. Specifically, 
defining k, = mr/(fityor), where r is distance from the centre of mass 
in TOE, a(k,) should closely correspond to the in-trap n(k) (see 
Methods, Extended Data Fig. 1). However, this correspondence does 
not hold for very low momenta (k;S kiow = mL/(htror)), owing to the 
convolution of the TOF distribution with the initial (in-trap) cloud 
shape. The highest momentum in our clouds (knign = -/2mUp /fi) is set 
by the trap depth Up + kg x 60:nK, which corresponds to an energy sink. 

In Fig. 3a we show an example of /i(k,), for AU/js=1.1(1) and t,=4s 
(black line in the main panel and lower inset), obtained by averaging 
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total atom population for k, < k,, (the low-k ‘source’; green), and for 

kin <k;< ky (in the cascade region; yellow). At long times (solid lines) 
Noource = —3.6(1.5) atoms ms~!, whereas Neascade = —0.2(3) atoms ms”! is 
consistent with zero. All populations are corrected for losses due to the 
collisions with the background gas in the vacuum chamber (see Methods). 
c, Exponent 7 versus shaking time in experiment (blue, AU/p=1.1(1)) 
and simulations (red, AU/j1= 1). d, Exponent y versus shaking amplitude 
in experiment (blue) and simulations (red), for t;=4s. 


over 20 images and also performing an azimuthal average. Vertical red 
lines indicate the kjpy and kpigh boundaries. Away from these bounda- 
ries we observe a power-law behaviour, with y= 3.5. This behaviour is 
even more visually evident in the lower inset, in which we plot 
ko 7i(k,), with yo = 3.5. In the top inset in Fig. 3a we show the result 
of GPE simulations (for AU/j= 1), which also exhibit a power-law 
distribution. Moreover, the experiment and simulations are consistent 
with the same value of 7. 

In Fig. 3b we present the evolution of fi(k,) towards the turbulent 
steady state, as the shaking time is increased. In the inset we show (on 
a linear scale) the total atom populations in the low-k ‘source’ region 
k,<k» and in the range ky, <k,< ky where the power-law distribution 
is established in steady state (k,,, and ky, are boundaries defined in the 
lower inset of Fig. 3a). Initially there is a net transfer of population from 
the source to the cascade region. The population growth in the cascade 
region means that the population flux through this k-space range is not 
constant at these early times. However, once the steady state is 
established, the population in the cascade k range saturates at a constant 
value, while the source is still slowly depleted. This is indeed what is 
expected for a direct cascade, in which a constant, k-independent 
population flux passes from the source, through the cascade range, to 
the high-k sink; formally, this population flux, for a given energy flux, 
should tend to zero as the sink is moved towards infinite energy”®. (For 
a non-infinite-energy sink, one strictly speaking has a quasi-steady 
state, because at very long times the source would be too depleted to 
support a constant-flux cascade.) 

We further cross-validate our experiments and first-principles cal- 
culations by fitting the cascade exponent 7 in the range kin <k,< km. 
In Fig. 3c we show that, for AU/ ~ 1, the experiment and simula- 
tions exhibit very similar evolution with the shaking time, and reach 
a steady-state value of 7 after t, 2s. In Fig. 3d we plot the measured 
and simulated 7 values versus the shaking amplitude for fixed t,=4s. 
Here we see that the steady-state value of 7 is essentially independent 
of AU, reinforcing the robustness of our conclusions (for small AU 
the steady state is not reached for t,=4s; see also the inset of Fig. 2e). 

We lastly discuss our findings in the context of previous theoretical 
work. The 7 we observe in both the experiment and simulations is close 
to one of the scarce analytical predictions—the Kolmogorov—Zakharov 
direct-cascade exponent y= 3, for the weak-wave turbulence of a 
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compressible superfluid”®. (This result is distinct from the prediction!” 
for the Kolmogorov energy spectrum of incompressible-flow- 
dominated turbulence, £(k) x k~>3.) This prediction is based on an ide- 
alized theory that starts with the GPE, but neglects the role of vortices 
and assumes weak interactions between the waves. Our simulations 
show that vortices are present in the system, but the value of 7 suggests 
that they do not play a quantitatively important part in the turbulent 
cascade (observed at wavenumbers k€ 2 1), and the Kolmogorov- 
Zakharov theory is a reasonable approximation. Consistent with this, in 
simulations we find that, in the relevant k range, the compressible-flow 
contribution to the energy dominates over that of the incompressible 
flow (see Methods, Extended Data Fig. 2). The small difference between 
=3.5 and the approximate y=3 could arise as a result of several (inter- 
linked) factors, such as a residual role of vortices, the non-negligible 
incompressible-flow energy, the fact that in reality the interactions 
between the waves are not necessarily weak!>”8, and the increasing 
importance of quantum pressure in the GPE with increasing k. The 
experimental flexibility offered by atomic gases, in particular the pos- 
sibility to tune the strength of nonlinearity via Feshbach resonances, 
might enable better understanding of the applicability of the approxi- 
mate analytical predictions, and the limitations of the classical-field 
methods. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


Experimental system. The BEC of *’Rb atoms is produced in a quasi-uniform 
potential of a can-shaped dark optical dipole trap (see Fig. 1). The repulsive trap 
walls are sculpted using 532-nm laser light and a phase-imprinting spatial light 
modulator. They are formed by one circular tube beam (propagating along z) and 
two thin sheet beams that act as end caps’. At the end of evaporative cooling the 
trap depth is approximately kg x 10nK and the condensed fraction is 7) > 0.9. We 
then slowly (over 700 ms) raise the trap depth to Up + kg x 60 nK, which does not 
result in any observable drop in 7. Our atom number is calibrated to within 10% 
from the critical temperature for condensation’. The gradient of the modulus of 
the magnetic field along z, used to shake the cloud, is calibrated by pulsing it on 
for a short time 6t just after releasing the cloud from the trap and measuring the 
resulting velocity kick AUSt/(mL) in TOF. 

Phase-shift measurement of the resonance. The position of the centre of mass of 
the cloud in TOF is vtyor, where v is its velocity at the time of release. In analogy 
with a driven damped harmonic oscillator, we assume that for a driving force pro- 
portional to sin(wt) in steady state v(t) = A, weos(wt + @,,), where A,, and ¢,, are 
the w-dependent (in-trap) displacement amplitude and phase shift. For a shaking 
time f, such that wt, is a multiple of 21, the response v(t,) = A,wcos(¢@,,) vanishes 
on resonance, and more generally 


2 2 
w(w _ Wres) 


v(ts) Se 
(w? — wr,,)? + Fw? 


(1) 


where J’is the damping rate. In practice we fix t,= 2s and make measurements 
at discrete points w=jAw, where Aw=2n x 0.5 Hz and j is an integer. We then 
use Equation (1) to fit the data (see Fig. 2b, d) with w,s and Jas free parameters. 
Numerical methods. We implement a three-dimensional numerical simulation 
of the Gross—Pitaevskii equation (GPE) 


noe — 


h2 
—~—V? + Vault, t fe 2 
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where the coupling constant is g=4rh’a,/m, with a, the s-wave scattering length, 
and V.x:(r, f) is an external potential. We have Vext(r, t) = Voox(r) + Vs(1, ft), 
where Vpox(r) is the (static) box-trap potential and V,(r, t) = AUsin(wt)z/L is the 
(time-varying) shaking potential. We perform numerical simulations on a cubic 
grid of 256° points. Using a symmetrized split-step Fourier method we evolve the 
Bose field in time steps of 101s for up to 4s. The calculations are performed at FP32 
precision on an NVIDIA GeForce GTX TITAN X graphics card, and we achieve a 
running time of under 30 min for simulating each 1s of dynamics. 

Bogoliubov equations for the box trap. The starting point for the analysis of 
the linear response of the BEC to external perturbations is the GPE in Equation 
(2). We start by expanding 1(r, t) around y(r), the ground state in the potential 
Vext(1, t) = Voox(r): 


u(r, t) = e H/F (r) 4 u(r)e = v*(ne“*] 


where the asterisk denotes the complex conjugate. Linearizing with respect to 
uand v leads to the Bogoliubov equations: 


“|=m) (3) 


where 
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and no(r) = |9(r)/? is the ground-state density. The set of eigenvalues hw forms 
the spectrum of elementary excitations, and u and v give the corresponding eigen- 
modes. For periodic boundary conditions, mo is constant and the usual Bogoliubov 
spectrum is recovered”*. However, for fixed boundary conditions, the GPE cannot 
be solved analytically and no(r) cannot be written in a simple closed form (it is not 
even separable in cylindrical coordinates). We solve the Bogoliubov equations by 
first computing no(r) from an imaginary-time propagation of the GPE, and then 
numerically solving Equation (3). It is convenient to work in the basis of the 
free-particle eigenstates in a cylindrical box, so that the boundary conditions are 
automatically satisfied. Because we focus on the longitudinal modes, we restrict 
the basis to azimuthally symmetric functions. Our results for the frequency of the 
lowest-lying (antisymmetric) mode along z are shown in Fig. 2c as a grey shaded 
area (taking into account the uncertainty in the box size). In the limit of vanishing 
shaking amplitude, direct GPE simulations give the same results as the Bogoliubov 
approach. 


Anisotropy analysis. To quantify the anisotropy of the TOF expansion, seen 
in the atomic distribution after a long trop, we start with the column-density 
distribution in the TOF image f(y, z) = [n(r)dx, where n(r) is the three- 
dimensional distribution and x is the imaging axis. We then define the angular 
density distribution: 


12 
1 
Aj=—]a 0, r sin®)rd: 
p(é) a n(r cosd, r sin)rdr (4) 
ry 
where the polar origin is set at the centre of mass of fi(y, z) and N is the total atom 
number in the shell [r;, r2], so that J, p()d6 = 1. An isotropic distribution cor- 


responds to the uniform piso(9) = (271) ~!. To quantify the anisotropy as a deviation 
from this uniform distribution, we introduce a simple heuristic measure: 
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so that A=0 for piso and A— 1 for sharply-peaked distributions. For our pure BEC, 
the diamond-shaped TOF distribution is close to the idealized square-shaped dis- 
tribution depicted in the insets of Fig. 1b, d, for which (taking r, =0 and r, — 00) 
the angular distribution is p(?) = po(@ mod 7/2), with 


cos 70 for0<0<71/4 


py(0) =~ 
q 8|sin-70 forn/4<0< 1/2 


For this distribution, A ~ 9%. In experiments, any imaging imperfections lead to a 
positive A; for an equilibrium thermal gas (77 =0), which is known to be isotropic, 
we observe a small residual A ~ 3%, and define all our experimental values of A 
from this baseline. For the inset of Fig. 2e, the radial integration in Equation (4) is 
performed in the shell defined by 7 =hkjowttog/m and rz =hkpigntror/m (corre- 
sponding to the vertical red lines in Fig. 3a). 

Measurement of the momentum distribution. The measurement of the momen- 
tum distribution using the TOF technique requires a kinetic-energy-dominated 
state. We assess the validity of this measurement by comparing it to Bragg 
spectroscopy”°, which can provide a faithful measurement of the momentum dis- 
tribution even if the interaction energy is dominant over the kinetic energy. We 
shine onto the trapped cloud two off-resonant laser beams with wavevectors k, 
and kp, detuned from each other by a frequency Av, such that the resultant 1D 
Bragg-diffraction optical lattice is aligned with the axis of the box trap, k, — kz « Z. 
The angle between the beams is such that the recoil energy of the diffracted atoms 
E, kg X 320nK is larger than the trap depth (Up + kg x 60 nK), allowing the dif- 
fracted atoms to escape, and also much larger than the spread of energies in the 
trapped gas. Measuring the fraction of diffracted atoms as a function of Av yields 
the 1D momentum distribution along Z, np(kz), which is related to the planar 
distribution # (measured in TOF) by an Abel transform: 
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We use a long (20 ms) and low-power Bragg pulse to minimize Fourier broadening 
while always keeping the diffracted fraction below 15%. To compare the TOF and 
Bragg measurements we integrate our TOF images along 7. 

In Extended Data Fig. 1a, we apply both methods to the initial state, a quasi-pure 
BEC. In this case the TOF measurement (solid green line) overestimates the width 
of the momentum distribution, owing to the importance of interactions during 
the expansion; the Bragg spectrum agrees well with the expected Heisenberg- 
limited distribution (dashed red line)*°. However, as shown in Extended Data 
Fig. 1b, in the relevant case of the kinetic-energy-dominated turbulent state, 
the Bragg and TOF measurements are in excellent agreement, validating our 
assumptions. 

Background-gas losses. In the absence of shaking, the atom population in the trap 
slowly decays owing to collisions with the residual background gas in the vacuum 
chamber. These one-body losses are k-independent and described by an exponen- 
tial decay with a time constant that we measured to be Tyac= 13s. For analysing the 
population dynamics in the inset of Fig. 3b, we corrected all populations for this 
background loss by multiplying them by a common factor exp(ts/Tyac)- 
Numerical simulations of the turbulent cascade. In Extended Data Fig. 2a we 
show simulations of the dynamics of the in-plane momentum distribution in a 
shaken gas. Here fi(k) is computed from the spatial Fourier transform of the 
matter-wave field 7(r,t,). We observe that with increasing tf, the same power-law 
behaviour gradually extends from large to ever smaller length scales, as expected 
for a direct cascade. 
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Following the procedure outlined in ref. 17, we also study the fluid-dynamical 
kinetic-energy spectrum E(k). We start by computing #(k), the Fourier transform 
of the flow field 


w(r) = (h/m)|Y(r)|Ve(r) 


where ¢(r) is the phase of 7(r) and we omit the time label for brevity. Summing 
|W(k)) over all momenta with|k] = k gives the total €(k). Decomposing W(k) into 
longitudinal and transverse components splits €(k) into the compressible- (€.) and 
incompressible-flow (&) contributions, respectively. In Extended Data Fig. 2b we 
plot the ratio €.(k)/E(k). We find that €.(k) dominates in the k range kn <k<ky, 
where the power-law momentum distribution is observed in both experiments 
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and simulations; the same numerical observation was independently made by K. 
Fujimoto and M. Tsubota (personal communication). This supports the view that 
waves play a more important part than vortices in the turbulent cascade. Also, the 
vortices have a core size of approximately €, so it qualitatively makes sense that 
their contribution is not important when k2 1/€. 

Data availability. Source Data for Figs 1-3 and Extended Data Figs 1, 2 are available 
online. 
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Extended Data Figure 1 | Momentum distributions from TOF and 
Bragg techniques. a, b, Comparison of mp(k,) obtained using TOF 
expansion (solid lines) and Bragg spectroscopy (points), in the case of the 
initial, quasi-pure BEC (a) and the turbulent gas (b). The red dashed line 
in a corresponds to the Heisenberg-limited momentum distribution. All 
distributions are normalized to unity ( Ai ms mpd(k€) = 1), without any 
adjustable parameters. 
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Extended Data Figure 2 | Turbulent cascade in numerical simulations. 
a, In-plane momentum distribution 7i(k) for various shaking times f,. 

b, Ratio of the compressible- (€,) to incompressible-flow (E,) components 
of the fluid-dynamical kinetic energy, with the colours corresponding to 
the shaking times in a. The simulation parameters for both panels are 
N=8 x 104, shaking frequency w/(27) = 9 Hz and shaking amplitude 
AU= pL. 
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Metal-organic frameworks as selectivity regulators 
for hydrogenation reactions 


Meiting Zhao, Kuo Yuan!*, Yun Wang’, Guodong Li, Jun Guo!, Lin Gu’, Wenping Hu*°, Huijun Zhao* & Zhiyong Tang! 


Owing to the limited availability of natural sources, the widespread 
demand of the flavouring, perfume and pharmaceutical industries 
for unsaturated alcohols is met by producing them from 
c,3-unsaturated aldehydes, through the selective hydrogenation 
of the carbon-oxygen group (in preference to the carbon- 
carbon group)’. However, developing effective catalysts for this 
transformation is challenging”~’, because hydrogenation of the 
carbon-carbon group is thermodynamically favoured’. This difficulty 
is particularly relevant for one major category of heterogeneous 
catalyst: metal nanoparticles supported on metal oxides. These 
systems are generally incapable of significantly enhancing the 
selectivity towards thermodynamically unfavoured reactions, because 
only the edges of nanoparticles that are in direct contact with the 
metal-oxide support possess selective catalytic properties; most of 
the exposed nanoparticle surfaces do not” “. This has inspired the 
use of metal-organic frameworks (MOFs) to encapsulate metal 
nanoparticles within their layers or inside their channels, to influence 
the activity of the entire nanoparticle surface while maintaining 
efficient reactant and product transport owing to the porous nature of 
the material'*-'°, Here we show that MOFs can also serve as effective 
selectivity regulators for the hydrogenation of «,-unsaturated 
aldehydes. Sandwiching platinum nanoparticles between an inner 
core and an outer shell composed of an MOF with metal nodes of 
Fe?*, Cr*+ or both (known as MIL-101; refs 19-21) results in stable 
catalysts that convert a range of «,-unsaturated aldehydes with 
high efficiency and with significantly enhanced selectivity towards 
unsaturated alcohols. Calculations reveal that preferential interaction 
of MOF metal sites with the carbon-oxygen rather than the carbon- 
carbon group renders hydrogenation of the former by the embedded 
platinum nanoparticles a thermodynamically favoured reaction. We 
anticipate that our basic design strategy will allow the development of 
other selective heterogeneous catalysts for important yet challenging 
transformations. 

The coordinatively unsaturated metal sites (CUSs) inside MOFs can 
be readily tuned to adjust interactions between MOFs and reactants, 
to activate targeted chemical bonds in reactants and thereby to lower 
the reaction-energy barrier of the desired chemical transformation”. 
We explore this concept with MOF sandwich nanostructures that are 
then used as hydrogenation catalysts (Supplementary Fig. 1). The 
nanostructures contain a layer of platinum (Pt) nanoparticles encap- 
sulated between a core and a shell made of MIL-101, which contains 
either Fe** or Cr** trimers, connected with 1,4-benzenedicarboxylate 
(BDC) linkers (Fig. 1a and Supplementary Note 1)?! All synthesized 
products are of uniform octahedral shape, with 2.8-nm Pt nanoparti- 
cles immobilized between the MIL-101 core and shell (Fig. 1b-] and 
Supplementary Figs 2-13). Catalytic performance is evaluated using 
five typical sandwich nanostructures: two MIL-101(Fe)@Pt@MIL- 
101(Fe) systems with shell thicknesses of about 9.2 nm and 22.0nm 
(denoted as MIL-101(Fe)@Pt@MIL-101(Fe)?? and MIL-101(Fe)@ 


Pt@MIL-101(EFe)??°; Fig. lb-e and Supplementary Figs 8, 9); one 
MIL-101(Cr)@Pt@MIL-101(Cr) with a shell thickness of around 
5.1nm (denoted as MIL-101(Cr)@Pt@MIL-101(Cr)>*}; Fig. 1f, g and 
Supplementary Fig. 10); and two MIL-101(Cr)@Pt@MIL-101(Fe) 
systems with shell thicknesses of around 2.9nm and 8.8 nm (denoted 
as MIL-101(Cr)@Pt@MIL-101(Fe)”? and MIL-101(Cr)@Pt@MIL- 
101(Fe)**; Fig. 1h-1, Supplementary Figs 11-13 and Supplementary 
Note 2). All of the materials contain single crystalline MIL-101 with 
a face-centred-cubic (fcc) structure, and a lattice fringe of 2.6nm that 
corresponds to the (222) planes of MIL-101 (ref. 19; Fig. 1c-k and 
Supplementary Figs 14-16). The embedded Pt nanoparticles exhibit 
high crystallinity, with an interplanar spacing of 0.23 nm that corre- 
sponds to the (111) planes of fcc platinum (Fig. le, k, insets). 

Table 1 summarizes the results obtained when using the sandwich 
MIL-101@Pt@MIL-101 nanostructures as catalysts for the selective 
hydrogenation of cinnamaldehyde (A) to cinnamyl alcohol (B), along 
with results for Pt nanoparticles, MIL-101 and MIL-101@Ptas controls 
(see also Fig. 1, Supplementary Figs 2-7, and Supplementary Tables 1, 2). 
Because the C=O and C=C groups in A are both possible hydro- 
genation targets, we expect a product mixture containing the targeted 
cinnamyl alcohol (B), as well as hydrocinnamaldehyde (C) and phenyl 
propanol (D). For a more meaningful evaluation of selectivity for the 
desired product B, we first compared the catalysts’ performances at the 
same conversion level (about 45%) of A. We found no noticeable hydro- 
genation of A by MIL-101 (Table 1, entry 9), whereas Pt nanoparticles 
catalyse this reaction with a turnover frequency (TOF) of 372.4h! 
anda selectivity for B of only around 18.3% (Table 1, entry 8), consist- 
ent with density functional theory (DFT) predictions (Supplementary 
Fig. 17). 

The third controls—the MIL-101@Pt nanostructures—exhibit dra- 
matically increased selectivity for B (Table 1, entries 6 and 7), with 
86.4% for MIL-101(Fe)@Pt and 44.0% for MIL-101(Cr)@Pt. This 
remarkable promotion of selective hydrogenation of the C=O bond 
by MIL-101 is probably due to the CUSs in MIL-101 acting as Lewis 
acid sites that interact with the C=O bond and activate it”*. A Fourier 
transform infrared (FTIR) survey shows an obvious redshift of the 
Vc=o bond of A after mixing with MIL-101, while the vcc bond 
remains unaltered, confirming a selective interaction between the 
C=O bond of A and MIL-101 (Supplementary Fig. 18). 

DFT calculations with consideration of spin-polarization effect— 
aimed at understanding the origin of the interaction between MIL- 
101 and A—show that the five-coordinated metal nodes in the 
Fe30Cl(COO)¢6H20 or Cr30Cl(COO)¢H20 trimers of supertetrahe- 
dral MIL-101 cells serve as active sites that interact with the C=O 
group of A (Supplementary Fig. 19)!°. The calculated adsorption 
energy of A over the Fe;0Cl(COO)sH20 and Cr30Cl(COO)¢H20 
trimers is —1.26eV and —1.01 eV, respectively (Fig. 2); that is, bind- 
ing between the trimers and A through Fe-O or Cr-O interactions 
is thermodynamically favoured, with the interaction between A and 
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MIL-101@Pt 
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Figure 1 | Sandwich MIL-101@Pt@MIL-101 nanostructures. a, Synthetic 
route to generating sandwich MIL-101@Pt@MIL-101, comprising Pt 
nanoparticles (NPs) sandwiched between a core and a shell of MIL-101. 

b, d, Transmission electron microscopy (TEM) images of MIL-101(Fe)@ 
Pt@MIL-101(Fe)*?° (b) and MIL-101(Fe)@Pt@MIL-101(Fe)?? (d). 

c, e, High-resolution TEM (HR-TEM) images of MIL-101(Fe)@Pt@MIL- 
101(Fe)?*° (c) and MIL-101(Fe)@Pt@MIL-101(Fe)?? (e). f, TEM image 

of MIL-101(Cr)@Pt@MIL-101(Cr)*". g, HR-TEM image of MIL-101(Cr)@ 
Pt@MIL-101(Cr)*!. h, j, TEM images of MIL-101(Cr)@Pt@MIL- 
101(Fe)** (h) and MIL-101(Cr)@Pt@MIL-101(Fe)*? (j). i, k, HR-TEM 


Fe;0Cl(COO).¢H20 being the stronger one (Supplementary Figs 
20-22 and Supplementary Note 3). The calculated reaction ener- 
gies for hydrogenation of A to B or C over MIL-101(Fe)@Pt are 
—2.15eV and —1.89eV (Fig. 2a), indicating that the formation of B 
(selectivity approximately 86.4%) is energetically favoured over that 
of C (selectivity roughly 9.6%). In the case of MIL-101(Cr)@Pt, the 
corresponding reaction energies for B and C formation are —0.69eV 
and —0.92 eV, respectively (Fig. 2b), indicating a smaller difference 
in the selectivity of B (roughly 44.0%) over C (about 40.0%). These 
results confirm the experimental observation that the CUSs in MIL- 
101 preferentially interact with the C=O group (rather than the C=C 
group) of A and activate it, giving rise to the improved selectivity for 
the formation of product B. 

We also note that at a conversion level of 45%, TOF values are 
122.1h7! for MIL-101(Fe)@Pt and 372.4h! for MIL-101(Cr)@Pt 
(Table 1, entries 6 and 7), indicating that MIL-101(Cr) does not affect 


=@ 
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images of MIL-101(Cr)@Pt@MIL-101(Fe)** (i) and MIL-101(Fe)@Pt@ 
MIL-101(Fe)? (k). Insets in b, d, f, h, j, TEM images of a single MIL- 
101@Pt@MIL-101 nanostructure. Insets in ¢, e, g, i, k, corresponding fast 
Fourier transform (FFT) images of MIL-101@Pt@MIL-101 (top right) and 
HR-TEM images of Pt NPs (bottom right). 1, High-angle annular dark- 
field scanning transmission electron microscopy (HAADF-STEM) image 
(top left) and corresponding energy-dispersive X-ray spectroscopy (EDS) 
elemental mapping images of a cross-section of single MIL-101(Cr)@Pt@ 
MIL-101(Fe)*”. 


the catalytic activity of Pt nanoparticles whereas MIL-101(Fe) causes a 
considerable decrease. X-ray photoelectron spectroscopy (XPS) meas- 
urements reveal a partial transfer of electrons from the Pt nanoparticles 
to MIL-101(Fe)** (Fig. 3a, b, d and Supplementary Note 4), whereas 
there is no obvious electron transfer between from the Pt nanoparticles 
to MIL-101(Cr) (Fig. 3a, c, d). This suggests the reduced electron den- 
sity of Pt nanoparticles in MIL-101(Fe)@Pt as a possible cause of the 
decreased hydrogenation activity. This is further supported by DFT cal- 
culations showing that the negative binding energy of hydrogen atoms 
(AE) ona platinum(111) (2 x 2) surface increases with positive charg- 
ing of the platinum surface (Supplementary Fig. 23), with the resulting 
stronger surface binding of the hydrogen atoms reducing the catalytic 
activity of MIL-101(Fe)@Pt”. 

Guided by these observations, and given that MIL-101 possesses 
two characteristic pore windows (of about 1.3 nm and 1.5nm) that are 
large enough to accommodate A (1.05 nm x 0.65 nm; Supplementary 
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Table 1 | Selective hydrogenation of different «,8-unsaturated aldehydes by different catalysts 


Reaction 


H 
RYAN RS 4 RNS ROS OH 


A B Cc D 
Cinnamaldehyde Furfural 3-methyl-2-butenal Acrolein 
ir (\ a Pee 
(e) 
Selectivity (%)t 
Entry Catalysts Time (h) Conversion (%)t B CG D TOF# (h~) 
Substrate: cinnamaldehyde* 
1 IL-101(Fe)@Pt@MIL-101(Fe)?* 8.5 45.0 94.1 29 3.0 18.0 
2 L-101(Fe)@Pt@MIL-101(Fe)?*° 10 45.0 96.3 19 18 15.3 
3 MIL-101(CN@Pt@MIL-101(Fe)?? 45.0 94.2 4.7 11 76.3 
4 L-101(Cr)@Pt@MIL-101(Fe)®® 6 45.0 94.5 28 2.7 25.4 
5 L-101(CrY@Pt@MIL-101(Cr)>+ 0.7 45.0 79.2 14.6 6.2 218.1 
6 L-101(Fe)@Pt 1.25 45.0 86.4 9.6 40 122.1 
7 L-101(Cr)@Pt 0.41 45.0 44.0 40.0 16.0 372.4 
8 Pt NPs 0.41 45.0 18.3 60.6 21.1 372.4 
9 L-101(Cr) or MIL-101(Fe) 24 (0) (0) 
10 IL-101(Fe)@Pt@MIL-101(Fe)?* 24 94.3 97.0 0 3.0 3.3 
11 L-101(Fe)@Pt@MIL-101(Fe)*2° 24 86.3 97.4 0.3 23. 2.2 
12 MIL-101(CN@Pt@MIL-101(Fe)?? 20 99.8 95.6 08 3.6 6.9 
13 MIL-101(CN@Pt@MIL-101(Fe)®® 20 90.6 94.3 2.1 3.6 5.4 
14 MIL-101(CN@Pt@MIL-101(Cr)*! ? 95.1 62.6 12:2 25.2 161.3 
Substrate: furfural* 
15 L-101(Fe)@Pt@MIL-101(Fe)?? 15 85.6 93.2 35 3:3 9.4 
16 L-101(Cr)@Pt@MIL-101(Fe)®* 7 87.9 96.5 3.5 ) 42.6 
17 L-101(Cr)Y@Pt@MIL-101(Cr)>+ 5 98.5 99.8 0) 0.2 66.8 
Substrate: 3-methyl-2-butenal* 
18 IL-101(Fe)@Pt@MIL-101(Fe)?* 17 71.1 87.8 3.2 9.0 4.2 
19 L-101(Fe)@Pt@MIL-101(Fe)**° 24 59.9 92.5 28 47 8.5 
20 L-101(Cr)@Pt@MIL-101(Fe)®® 10 85.0 64.4 79 27.7 28.8 
Substrate: acrolein* 
21 L-101(Fe)@Pt@MIL-101(Fe)?? 3 76.9 ToT 19.0 5.3 86.9 
22 L-101(Fe)@Pt@MIL-101(Fe)?*° 3 52:7 97.3 2.7 ) 59.6 
23 L-101(Cr)@Pt@MIL-101(Fe)®® 0.75 87.5 57.9 17.3 24.8 395.8 
24 Auzs(SR)i8/Fe203" 3 47.0 92.0 _— _— 3.1 
25 Au clusters coated with tert-butyl 18 61.0 >99 — — 17 


(naphthalen-1-yl)phosphine oxide§ 


A, «,3-unsaturated aldehydes (cinnamaldehyde, furfural, 3-methyl-2-butenal or acrolein); B, unsaturated alcohol; C, saturated aldehyde; D, saturated alcohol. 
“Reaction condition: each catalyst contains 0.23 mg Pt NPs; the reaction requires 0.4 mmol of A, room temperature and 3 MPa Ho. 
iThe percentage conversion of A and selectivity for specific products were determined by gas chromatography-mass spectrometry and gas chromatography or 'H NMR. 


+TOF was calculated by the mole number of converted A:(mole number of total Pt or Au)! ho}. 


Reaction condition2®: 100 mg 1%Auz2s(SR)18/Fe203, 0.1 mmol A, 0°C and 0.1 MPa Hz. The catalytic selective hydrogenation of acrolein has been described previously. 
SReaction condition?’: 0.01 mmol Au clusters (assuming 46.28% Au on the basis of element analysis), 0.5 mmol A, 5ml tetrahydrofuran, 40°C and 4 MPa Hz. The catalytic selective hydrogenation of 


acrolein has been described previously. 


Table 2 and Figs 24, 25), we next explored the sandwich MIL-101@Pt@ 
MIL-101 structure as an ideal catalyst for selective hydrogenation of A 
to B. Entries 1-5 in Table 1 confirm that coating with MIL-101 shells 
of different thicknesses considerably enhances the selectivity for B: the 
selectivity with MIL-101(Cr)@Pt@MIL-101(Cr)>" is 79.2%, while using 
MIL-101(Cr) or MIL-101(Fe) cores that are coated with MIL-101(Fe) 
shells gives selectivities that are always higher than 94%. The improved 
selectivities are accompanied by a decrease in TOF values, which is 
expected because the MIL-101 shells slow down reactant and product 
diffusion. Coating with MIL-101(Fe) leads to an additional decrease 
in catalytic activity, owing to the interfacial electron-transfer effect 
(Fig. 3 and Supplementary Fig. 23). 

Our MIL-101@Pt@MIL-101 catalysts combine exceptionally high 
selectivity and conversion efficiency (Table 1, entries 10-14, and 
Supplementary Table 3), and outperform state-of-the-art catalysts 
(Supplementary Table 4). MIL-101(Cr)@Pt@MIL-101(Fe)”, for exam- 
ple, exhibits excellent selectivity (95.6%) and almost full conversion 
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(99.8%) (Supplementary Fig. 26). Moreover, reusability tests indicate 
that for MIL-101(Cr)@Pt@MIL-101(Fe)”’, both the conversion effi- 
ciency of A and the selectivity for B remain almost unchanged over 
five successive catalytic cycles (Supplementary Figs 26-30, 31a), with 
the stability of the system being further verified by XRD and trans- 
mission electron microscope (TEM) measurements (Supplementary 
Figs 31b, 32) that show no evidence for structural or morphological 
differences between fresh and used catalysts. Excellent catalytic 
stability is also obtained with MIL-101(Fe)@Pt@MIL-101(Fe)?? 
(Supplementary Fig. 33). 

To further confirm that the integration of Pt nanoparticles and 
CUSs into a single sandwich nanostructure results in useful selec- 
tive hydrogenation capabilities, we used MIL-101@Pt@MIL- 
101 to hydrogenate smaller-sized a,$-unsaturated aldehydes. 
Acrolein (0.69 nm x 0.51 nm), with no substituents on C=C bond, 
branched 3-methyl-2-butenal (0.79 nm x 0.60 nm), and furfural 
(0.81 nm x 0.64nm), with a furan ring (see Supplementary Fig. 34 for 
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Figure 2 | Investigation of the reaction mechanism on MIL-101@Pt. 

a, b, Theoretical calculations of the interaction of cinnamaldemyde 

(A) with Fe trimers (a) or Cr trimers (b), and the subsequent selective 
hydrogenation of A to cinnamy] alcohol (B) and hydrocinnamaldehyde 
(C), on a MIL-101(Fe)@Pt or MIL-101(Cr)@Pt catalyst. Brown, carbon; 
pink, hydrogen; red, oxygen; gold, iron; blue, chromium; green, chlorine. 


structures), are all converted through preferential hydrogenation of the 
C=O group over the C=C group (Table 1, Supplementary Tables 5-7 
and Supplementary Figs 35-37). Conversion efficiencies are 52.7%, 
59.9% and 98.5%, respectively, with selectivities for hydrogenation of 
the C=O group being 97.3%, 92.5% and 99.8%. When compared with 
other catalysts”®?’, MIL-101(Fe)@Pt@MIL-101(Fe)**° also exhibits 
excellent selectivity towards allyl alcohol (Table 1, entries 24 and 25; 
Supplementary Note 5 and Supplementary Table 8). 
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Figure 3 | XPS profiles of Pt, Fe, Cr and O in different catalysts. a, Pt 4f 
level. b, Fe 2p level. c, Cr 2p level. d, O 1s level. The catalysts investigated 
were pure Pt NPs, MIL-101(Fe), MIL-101(Cr), MIL-101(Cr)@Pt and 
MIL-101(Fe)@Pt. Numbers adjacent to dashed lines represent the peaks’ 
binding energy values. 
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To explore the generality of the catalyst preparation method, we also 
synthesized?8-°° the sandwich structures MOF-525(Zr)@Pt@MOF- 
525(Zr)?°°, MOF-74(Co)@Pt@MOF-74(Co)*4, UiO-66(Zr) @Pt@UiO- 
66(Zr)!"?, and UiO-67(Zr)@Pt@UiO-67(Zr)**" (Supplementary Table 
2 and Supplementary Figs 38-45), which catalysed the hydrogenation 
of A with conversion efficiencies of 35.2%, 35.0%, 90.0% and 69.4%, 
respectively, and selectivities towards B of 85.0%, 70.1%, 65.0% and 
73.0%, respectively (Supplementary Figs 46-49 and Supplementary 
Table 9). Finally, as an example of a system containing different metal 
nanoparticles, we also prepared MIL-101(Fe)@Ru@MIL-101(Fe)**, 
which hydrogenated A with a conversion efficiency of 48.7% and 
a selectivity towards B of 58.5% (Supplementary Figs 50-52 and 
Supplementary Table 10). These results illustrate the generality of the 
concept of using CUSs in MOFs as a means of tuning the selectivity 
of the hydrogenation of a,8-unsaturated aldehydes. Moreover, using 
commercial Pt/C (carbon) and Pt/Fe2O; for the hydrogenation of A 
(Supplementary Figs 53, 54) resulted in conversion efficiencies of 98.1% 
and 54.5% and selectivities towards B of 39.9% and 84.5%, respectively 
(Supplementary Table 11), with the different selectivities suggesting 
that Fe-based supports favour the hydrogenation of C=O groups over 
C=C groups (although not to the extent seen with the sandwich forms 
of catalysts). Taken together, our observations confirm the considerable 
potential of MOFs as a new generation of heterogeneous catalytic 
supports that may prove effective when targeting important but highly 
challenging reactions. 
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Mantle dynamics inferred from the crystallographic 
preferred orientation of bridgmanite 


Noriyoshi Tsujino!, Yu Nishihara’, Daisuke Yamazaki!, Yusuke Seto’, Yuji Higo* & Eiichi Takahashi? 


Seismic shear wave anisotropy! © is observed in Earth’s uppermost 
lower mantle around several subducted slabs. The anisotropy caused 
by the deformation-induced crystallographic preferred orientation 
(CPO) of bridgmanite (perovskite-structured (Mg,Fe)SiO;) is 
the most plausible explanation for these seismic observations. 
However, the rheological properties of bridgmanite are largely 
unknown. Uniaxial deformation experiments’ ° have been carried 
out to determine the deformation texture of bridgmanite, but the 
dominant slip system (the slip direction and plane) has not been 
determined. Here we report the CPO pattern and dominant slip 
system of bridgmanite under conditions that correspond to the 
uppermost lower mantle (25 gigapascals and 1,873 kelvin) obtained 
through simple shear deformation experiments using the Kawai- 
type deformation-DIA apparatus’. The fabrics obtained are 
characterized by [100] perpendicular to the shear plane and [001] 
parallel to the shear direction, implying that the dominant slip 
system of bridgmanite is [001](100). The observed seismic shear- 
wave anisotropies near several subducted slabs'~* (Tonga-Kermadec, 
Kurile, Peru and Java) can be explained in terms of the CPO of 
bridgmanite as induced by mantle flow parallel to the direction of 
subduction. 

Recent P-wave seismic tomography"! revealed four geometries of sub- 
ducted slabs in the mantle. Type I refers to a slab that is stagnant above 
the 660-km discontinuity. Type I is assigned to slabs that penetrate 
the 660-km discontinuity (for example, Kuril). Type III slabs lie in the 
uppermost lower mantle at depths between 660-1,000 km (for example, 
Tonga—Kermadec, Sumatra and South America). Type IV slabs con- 
tinuously descend into the mid lower mantle. Seismic tomography"? 
provides morphological images of the subducted slabs in Earth’s man- 
tle, but does not give any insight into the dynamic features of the slab. 
However, the seismic anisotropy observed in the mantle provides infor- 
mation on the flow direction of the slabs in the deep mantle because 
this anisotropy may be the reflection of the CPO of the constituent 
minerals, which is yielded by dislocation creep deformation. 

As shown in previous reports, an important anisotropy that demon- 
strates clear shear-wave splitting was observed in the uppermost lower 
mantle of the Tonga-Kermadec slab region'”. Recently, such anisotro- 
pies were also observed in the uppermost lower mantle of other slab 
regions* ©. In these observations, shear wave splitting shows fast polari- 
zation that is generally parallel to the plane of the subducted plate (for 
example, Tonga—Kermadec, Java, Peru and Kurile)!~+, whereas other 
observed slab regions do not show a consistent splitting pattern’. It 
is therefore interesting to examine how the observed shear-wave 
anisotropy could be interpreted by the CPO of the lower-mantle minerals. 
It is generally accepted that in the pyrolitic mantle model, the consti- 
tuents of the lower mantle are bridgmanite or (Mg,Fe)SiO3 perovskite, 
of space group Pbnm (77 vol%), ferropericlase (16 vol%) and CaSiO3 
perovskite (7 vol%)!*. The contribution of the CPO of ferropericlase to 
seismic anisotropy is negligible because of its nearly isotropic elasticity 


at the uppermost lower mantle conditions!°. CaSiO3 perovskite is elas- 
tically anisotropic’* but the contribution of this phase to seismic aniso- 
tropy will not be substantial because of the small amount of CaSiO; in 
the lower mantle. In contrast, it is anticipated that considerable seismic 
anisotropy can be produced by the CPO of bridgmanite'”/* on the basis 
of its high elastic anisotropy'””° and overwhelming presence in the 
lower mantle. We therefore focused on the deformation-induced CPO 
of bridgmanite as the most important factor in lower-mantle seismic 
anisotropy. 

We performed shear deformation experiments on bridgmanite with 
a controlled strain rate under lower-mantle conditions by employing 
the deformation-DIA-type multi-anvil press with the Kawai-type cell 
assembly (6-8 type)!°. Dense, sintered (Mg 97Feo,93)SiO3 bridgmanite 
aggregates synthesized at 25 GPa and 1,873 K were used as samples 
for the deformation experiments. A backscattered electron image (Fig. 1a) 
and pole figures (Fig. 2a) of sintered bridgmanite revealed that the 
sample was an equigranular aggregate with a typical grain size of 15 1m 
and a random crystallographic orientation, determined from two- 
dimensional monochromatic X-ray diffraction patterns (see Methods). 
In the deformation experiments, a thin ellipse of the bridgmanite aggre- 
gate with a strain marker of Ni foil (which was initially set at the middle 
of the sample, perpendicular to the cut surfaces of the pistons) was 
sandwiched between 45°-cut polycrystalline dense alumina pistons at 
the centre of the cell assembly. As the differential rams of the deforma- 
tion-DIA press advance, the alumina pistons apply simple shear to the 
thin bridgmanite sample. 

Figure 1b and c shows the microstructures of the recovered samples 
from the undeformed (annealing only) experiment (run K116) and 
from the deformation experiment (run K122), respectively, both per- 
formed at identical pressure (P) and temperature (T) conditions 
(25 GPa, 1,873 K). In the former, the strain marker was almost normal 
to the cut surfaces of the alumina pistons, showing that the shear strain 
of the bridgmanite sample was very close to zero. The sample was thus 
almost free from deformation throughout the experimental processes; 
that is, compression, heating, annealing and subsequent decompres- 
sion. As shown in Fig. Ic, the strain marker definitely tilted in the 
deformed sample. From the tilting angle of the strain marker (~38°) 
and the deformation time (1h), the total strain and average strain rate 
were determined to be y+ 0.8 and 7~2 x 10~4s~1, respectively. The 
thickness of the deformed sample was reduced by about 20% in com- 
parison with the original, suggesting that a uniaxial component of 
deformation was present. The microstructural observation in Fig. 1d 
show that elongated bridgmanite grains of sizes up to 30-40 jum accu- 
mulated in the shear direction and are surrounded by small grains of 
less than a few micrometres. The structural features indicate that 
dynamic recrystallization occurred. The CPO of bridgmanite could 
therefore be induced by deformation involving dislocations. 

The crystallographic orientation of the bridgmanite was determined 
using the two-dimensional monochromatic X-ray diffraction pattern 
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Figure 1 | Backscattered electron images of the bridgmanite aggregates. 
a, Sintered sample. b, Sample annealed at 25 GPa and 1,873 K for 10 min 
(run K116). ¢, d, Sample deformed at 25 GPa and 1,873 K for 1h (run 
K122) before (c) and after (d) etching with colloidal silica. Ina andd 

the polished surfaces were etched with colloidal silica to clarify the grain 
boundaries. In b and c the black arrows indicate the Ni foil strain markers. 
The strain marker in the annealed sample in b remains perpendicular to the 
45°-cut surfaces of the alumina pistons, indicating almost no deformation 
during compression and annealing. In c, the Ni strain marker in the 
deformed sample is tilted by 38°, indicating a strain of + + 0.8. Elongated 
grains, indicating dynamic recrystallization, were observed only in the 
deformed sample (black arrow in d). The x, y and z axes correspond to the 
shear direction, the shear plane normal and direction perpendicular to 
both x and y, respectively. This definition of coordinates is used in 

Figs 2 and 3 and in Extended Data Fig. 7. 


method”?! for both the starting and deformed samples. The sheared 
bridgmanite aggregate demonstrated strong fabric (Fig. 2b), whereas 
the starting material did not show a notable CPO pattern (Fig. 2a). In 
the sheared bridgmanite, the [100] axis was oriented perpendicular to 
the shear plane. Although weak girdle patterns of the [001] and [010] 
axes (which correspond to uniaxial compressive strain) were observed, 
the [001] and [010] axes were mostly aligned parallel and normal to 
the shear direction on the shear plane, respectively. Experimental tex- 
tures are compatible with orientation-producing deformation mech- 
anisms such as dominant slip on [001](100) at 25 GPa and 1,873 K, 
which correspond to conditions found in the uppermost lower mantle. 

The dominant slip system of bridgmanite at high deviatoric stress 
has been reported through high-pressure experiments”*”* and 
theoretical calculations'®:3, as shown in Extended Data Table 1. 
Using a diamond anvil cell, uniaxial deformation experiments 
were conducted on bridgmanite at room temperature””. No clear 
CPO pattern was observed’, probably owing to insufficient strain, 
whereas dominant slip systems were reported as [100], [010] and 
<110> on the (001) plane below 55 GPa and the (100) plane over 
55 GPa (ref. 22). Uniaxial stress relaxation experiments (USRE) 
on bridgmanite were carried out using the Kawai-type multi-anvil 
press at 25 GPa and 1,673 K (ref. 8). Analysis of the recovered sam- 
ples using the X-ray peak broadening technique suggested that the 
dominant slip direction is [100], in contradiction with our results. 
Dense deformation bands across the pre-existing twin boundaries 
were observed in the USRE%, suggesting that the bridgmanite aggre- 
gate was deformed under very high deviatoric stress. Extremely high 
deviatoric stress values of up to several gigapascals, induced by com- 
pression at room temperature, were directly measured during in situ 
USRE of olivine polymorphs”*”>. The [100] slip direction observed 
in USRE of bridgmanite® might be the result of very high deviatoric 
stress. 
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According to first-principles calculations” the [100](001) slip sys- 
tem is one of the most easily activated slip systems of bridgmanite in 
the dislocation glide region. This supports the hypothesis that the slip 
direction observed in the USRE results from dislocation glide. On the 
other hand, another study based on first-principles calculations with a 
visco-plastic self-consistent model!* suggested that the most easily acti- 
vated slip system is [010](100) at 0 K for a pressure range of 0-100 GPa 
in the dislocation glide region. Both calculations and previous experi- 
mental results are inconsistent with the present results. This could be 
due to differences in the dominant slip mechanisms or the dominant 
deformation mechanisms found for varying experimental conditions 
(for example, stress, temperature and pressure), such as in the case of 
olivine”. 

CaTiO3 perovskite is often used as an analogue material for bridg- 
manite because the SiO, or TiO,g octahedra have similar rotation 
angles. Uniaxial deformation experiments were performed on CaTiO; 
perovskite”’, in which dislocation creep was expected owing to the high 
temperature (up to 1,973 K) and low deviatoric stress (25-120 MPa). 
TEM observations of the recovered sample revealed that screw dislo- 
cation with Burgers vectors [100], and [011], on the (01—1),, plane, 
where indexing is based on the pseudo-cubic system, formed rectan- 
gular networks. The [100],, and [011],, directions on the (01—1)p< 
plane in pseudo-cubic indexation of perovskite include the 1/2[001] 
and [010] directions on the (100) plane in the orthorhombic system. 
The [001](100) slip system in the present study does not contradict that 
of CaTiO3 perovskite. 

The seismic wave anisotropy formed by the CPO of deformed bridg- 
manite in the present study, as shown in Fig. 3 and Extended Data Table 2, 
was calculated on the basis of the elastic constant (Extended Data 
Table 3)'°. For the shear-wave anisotropy of deformed bridgman- 
ite, the velocity of the horizontally polarized shear waves, Vsn, is 
around 1% higher than that of vertically polarized shear waves, Vsy, 
in horizontal flow, as shown in Fig. 3b, whereas Vsy is 0.03%-1.10% 
higher in vertical flow (Fig. 3d). As shown in Fig. 4a, shear-wave 
splitting with Vsy > Vsy is observed in the uppermost lower mantle 
from the Tonga-Kermadec subduction zone to the Australian 
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Figure 2 | Pole figures of bridgmanite showing the variation in the 
crystallographic orientation of the [100], [010] and [001] directions. 

a, b, The coordinate system is defined with respect to the deformation 
geometry of sintered bridgmanite (a) and deformed bridgmanite (K122; b) 
at 70.8 and 72 x 10-4 s"1. Black dashed lines represent the long axis 
of the ellipsoid strain of the deformed bridgmanite. The colour scale shows 
the intensity of the concentration of each axis as a multiple of the uniform 
density. 
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Figure 3 | Seismic wave anisotropy of bridgmanite aggregates deformed under shear at uppermost lower-mantle pressures and temperatures. These 
values were calculated using the ANISch5 and VpG software**. a, c, P-wave seismic velocity (Vp) for horizontal (a) and vertical (c) flow, respectively. 

b, d, The fastest S-wave seismic velocity (Vs,) polarization plane for horizontal (b) and vertical (d) flow, respectively. Black dashes show the polarization 
direction of the fastest S-wave. 


a Tonga-Kermadec b Java 


—————— 
Figure 4 | Schematic cross-sections of subducted slabs. a, The Tonga- almost parallel to the plane of the subducted plates. Yellow dashed lines 
Kermadec arc. b, The Java arc. c, The Peru arc. d, The Kuril arc. All correspond to the flow directions of the subducted slab. Black double 
cross-sections are based on P-wave tomography'!. The coloured lines arrows represent shear directions. The uppermost lower mantle near 
represent the shear seismic ray paths that are nearly parallel (blue) and the subducted slab is deformed by the flow directions that are parallel to 
perpendicular (orange) to the section for the observation of shear-wave the subducted slab, which align the [100] and [001] axes of bridgmanite 
anisotropies at the Tonga~Kermadec slab!“ and the Kuril, Peru and normal to the shear deformation plane and parallel to the shear 
Java slabs°, where the observation area, direction and stagnation depth direction, respectively. The lower inset in a shows the crystal structure of 
were different. The blue and orange polar circular histograms show the bridgmanite. Blue octahedra are the Si sites and the orange and red spheres 
polarization of the fast shear waves perpendicular to the blue and orange are the Mg and Si atoms, respectively. 


seismic ray paths, respectively. The fast polarization arrangement is 
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continental seismic stations”, where the ray path is nearly parallel to 
the section. The delay time ranges from 0.7s to 6.2 (refs 1 and 2), 
which corresponds to an average value of (Vsy — Vsyv)/Vsy © 0.1%-1%, 
assuming a 3,000-6,000-km ray path including a 1,000-2,000-km 
subducted plate region'! on the uppermost lower mantle. The mag- 
nitude of the shear-wave anisotropy calculated here is consistent with 
deformed bridgmanite (Fig. 3b). In contrast, the opposite shear wave 
anisotropy (Vsy > Vs) was observed from the Tonga~Kermadec sub- 
duction zone at the western North America stations’, where the ray path 
is nearly perpendicular to the section. It is thus concluded that both 
of the observed shear-wave anisotropies around the Tonga~Kermadec 
subduction zone, which are type III in the P wave tomography'!, 
are well explained by the CPO of bridgmanite yielded by the penetration 
of subducted slabs down to 1,000 km by vertical motion and subsequent 
stagnation at similar depths, with a horizontal flow in the pyrolitic man- 
tle (as shown in Fig. 4a). This flow pattern could be caused by the 
viscosity hill?*-*° at depths between 1,000 km and 1,500km, and the 
viscosity reduction of the slab by interconnection of ferropericlase in 
the post-spinel phase*!. Shear wave anisotropies at the top of the lower 
mantle beneath the Java (type III), Peru (type III) and Kuril (type I) 
subducted slabs were observed along the seismic ray path nearly 
perpendicular or parallel to the subducted plate surface? (Fig. 4b-d). 
In these observations, the fast polarizations in the shear-wave splitting 
appear to be arranged roughly parallel to the plane of the plate’, which 
is explained by the deformation-induced CPO of bridgmanite owing to 
flow parallel to the direction of subduction. The findings presented here 
on the CPO of bridgmanite, coupled with the observations of seismic 
anisotropy, provide a greater understanding of the direction of mantle 
flow, confirming inferences from seismic tomography data. 
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Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


Preparation of the starting material of bridgmanite aggregates. Well-sintered 
(Mgo.97Feo.o3)SiO3-bridgmanite aggregates that were free from a CPO were 
prepared using the following procedures. The composition is very close to 
the bridgmanite in the post-spinel assembly formed from San Carlos olivine 
(Mgo 9Feo.1)2SiO4 (ref. 33). First an orthopyroxene (opx) powder with the com- 
position of (Mgo 97Feo.93)SiO3 was synthesized from a mixture of MgO, SiO, and 
Fe,O3 with the prescribed ratio at 1,673 K and a fugacity fo of fayalite-magnetite- 
quartz (FMQ; 1 log unit) using a gas mixture (Hz and CO) furnace. The opx 
powder was then put in an iron inner capsule and sintered in a piston-cylinder 
apparatus at 1 GPa and 1,473 K. To avoid the formation of cracks in the sintered 
opx, we used pyrex glass as an outer capsule and decompressed the sample slowly 
at temperatures higher than 1,073 K after sintering. Bridgmanite aggregates for the 
deformation experiments were synthesized from the sintered opx at 25 GPa and 
1,873 K for 1h in the Kawai-type apparatus. It is known that substantial amounts 
of CPO develop in the bridgmanite aggregate, accompanied by conversion from 
opx under high deviatoric stress conditions’. In the synthesis process used for 
bridgmanite, we therefore use an NaCl capsule to reduce the deviatoric stress*4, 
The obtained bridgmanite aggregates were examined by field emission scanning 
electron microscopy and 2D X-ray diffraction (see Sample characterization sec- 
tion). The backscattered electron image (Fig. 1a) and the pole figures (Fig. 2b) of 
the sintered bridgmanite aggregates confirmed that the aggregates had a random 
crystallographic orientation with a grain-size of around 151m and are suitable as 
a starting material for the shear deformation experiments and subsequent CPO 
analyses. 

Deformation experiments. Shear deformation experiments at lower mantle con- 
ditions were conducted using the deformation-DIA type apparatus, known as the 
Kawai-type apparatus for triaxial deformation (KATD)*, installed at the Tokyo 
Institute of Technology. Extended Data Fig. 1a shows a schematic of the cell assem- 
bly adopted in the present deformation experiments. A cylinder of LaCrO3 was used 
as a heater. The upper and bottom pistons of well-sintered hard alumina are cut 
to make 45° surfaces that are used to shear stress to the sample mounted between 
them when the differential rams are driven. A Pt foil (50|1m thick) was placed at 
the ends of the 45°-cut alumina pistons to reduce the friction against the sideslip. 
An Ni foil (301m thick) set at the middle of the ellipse sample served as a strain 
marker, which also kept the oxygen fugacity of the sample equal to or less than the 
Ni-NiO buffer. The temperature was determined from the electric power on the 
basis of calibrations performed using a similar cell assembly with thermocouple 
(Extended Data Fig. 1b). Extended Data Fig. 2 shows the relationship between 
the generated temperature and the electric power obtained in two calibration 
runs (K63 and K71). It should be noted that both curves almost overlapped each 
other up to 1,873 K. Pressure calibration of the KATD apparatus was carried out 
by detecting the phase transformations from a-8 in Mg»SiOy (at 15.1 GPa)**, 8 
— in Mg2SiO,g (at 19.8 GPa)*’ and the dissociation of 1-Mg2SiO4 to MgSiO3— 
bridgmanite + MgO (periclase) (at 23.6 GPa)** at a temperature of 1,873 K. 

The specimens were first compressed to the desired pressure (25 GPa) at room 
temperature, and then heated at 1,873 K for 10 min to relax the deviatoric stress in 
the specimens. In run K116, which is the undeformed experiment, the specimens 
were quenched by shutting off the electric power and recovered to observe the 
state of the sample just before the deformation experiment. In the deformation 
experiment run (K122), the sample was also first annealed at 1,873 K for 10 min 
and then deformed by advancing each of the upper and lower differential rams to 
a displacement of 751m (that is, 150,1m in total) for a duration of 1h at 1,873 K 
(see Extended Data Fig. 3). Throughout the heating and the deformation stages, 
the load of the main ram was kept constant. The loads of the differential rams were 
linearly increased to advance the differential rams at high temperature (1,873 K). 
High-pressure experiments were conducted under nominally dry conditions. As 
the water solubility of Al-free bridgmanite is very low’ (<1 parts per million), the 
effect of water is considered to be negligible in the present study. 

Sample characterization. To observe the microstructure, the starting material and 
the recovered sample were polished with SiC sand papers and diamond paste in 
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sequence, and were then etched by colloidal silica to clarify the grain boundaries. The 
microstructures of the samples were investigated by a field emission scanning elec- 
tron microscope (JSM-7001F) at ISEI, Okayama University, Japan, as shown in Fig. 1. 

The electron beam used in the electron backscattered diffraction technique 
that is usually applied to measure crystallographic orientation would cause serious 
damage to the bridgmanite crystals. Therefore, we determined the crystallographic 
orientation of the bridgmanite by the two-dimensional monochromatic X-ray dif- 
fraction pattern method” for both the starting and the deformed samples. The 
two-dimensional monochromatic X-ray diffraction patterns were acquired by 
the imaging plate (FUJIFILM, 200 mm x 250mm) at the synchrotron beam line, 
BLO4B1, of SPring-8 at the Japan Synchrotron Radiation Research Institute (JASRI), 
Hyogo, Japan*®. A monochromatic X-ray of 61.388 keV (wavelength \=0.20197 A) 
was employed. X-ray diffraction measurements were collected using a beam size of 
100,1m x 200jum and a sample length of 500,1m. As shown in Extended Data Fig. 4, 
the direction of the X-ray beam is parallel to the shear plane and perpendicular to 
the shear direction. The zero degree azimuth angle ¢ indicates that it is perpen- 
dicular to the shear deformation direction. The distance between the sample and 
the imaging plate was about 601.8 mm, calibrated using a CeO) standard. The 
typical acquisition time was 10 min. The imaging plate data were digitized using a 
FUJI BAS2000 reader under a resolution of 100j1m x 100,1m. Extended Data Fig. 5 
represents the two-dimensional and one-dimensional X-ray diffraction patterns 
converted from IP data. In the CPO analysis of the bridgmanite aggregates, the 
(111), (020), (120), (210), (022), (202), (113), (122), (212), (023) and (221) diffrac- 
tion peaks were adopted. The software ReciPro”! was used for the CPO analysis 
(this software was previously used for CPO analysis in other studies’). The 
analysis was carried out under the following parameters: number of crystallites 
(2,000,000); size of the crystallites (1|1m); beam convergence angle (0.02°); beam 
monochromaticity (0.1%); step increment (1%); and directional density (60%). As 
shown in Extended Data Fig. 6, the observed intensities for all peaks are in good 
agreement with those of the simulated peaks. In addition, the CPO results were 
reanalysed by the MAUD program. As shown in Extended Data Fig. 7, the CPO 
of the deformed bridgmanite from the ReciPro and MAUD software programs 
were confirmed to be same, although the intensities of the CPO are not perfectly 
identical because of the different algorithms used in each program. 
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Extended Data Figure 1 | Schematic cross-sections of cell assemblies. a, Shear deformation experiments. b, Temperature calibration experiment. 
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Extended Data Figure 2 | The temperature and power relationship at 25 GPa using the cell assembly for the temperature calibration experiments. 
The results for the two runs are shown by black and red lines. 
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Extended Data Figure 3 | Loads and strokes of the differential rams during the deformation experiment (run K122) are shown as functions of 
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Extended Data Figure 4 | Schematic of the two-dimensional X-ray diffraction measurement set-up. For these measurements the direct beam stopper 
arm was located at the right side when the assembly is viewed from the front. 
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(a) 2D X-ray diffraction pattern of deformation sample 
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(b) 1D X-ray diffraction pattern of deformation sample 
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Extended Data Figure 5 | X-ray diffraction pattern of deformed 
bridgmanite. a, Two-dimensional X-ray diffraction pattern where ¢ is 

the azimuthal angle. b, One-dimensional X-ray diffraction pattern of the 
deformed bridgmanite (K122). In a the two-dimensional X-ray diffraction 
pattern at the imaging plate is taken from the back surface of the sample, 


which is opposite to the polished plane. The position of the direct beam 
stopper arm is thus on the left. The diffraction peaks of the (111), (020), 
(120), (210), (022), (202), (113), (122), (212), (023) and (221) planes were 
used in the analysis of the CPO patterns. Brg, bridgmanite. 


© 2016 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


LETTER 


25 ot SEELEY ee 1 : : = 25 SEERA 
(a) (111) =" (b) (020)} > | (c) (120) (d) (210) 
= raw data D 
2 20 —s— simulation datal ] = 20 2 2.07 
2 zB s 
215 7215 £154 
£ |= "| 3 
a . = 
§ 1.0 wd B10 N10 Wate / 
NN = o act | 
[oy [wv] . 
Ss — Nate | E 
5 05 4 5 os ) al 50s; 
Zz Zz zit 
0.0 ptivatisaiisaiig 1 0.0 1 1 stiriity _ 0.0 posts asias Pisiiiis 
60 120 180 240 300 360 0 60 120 180 240 300 360 120. 180 240 300 360 
a Azimuth angle (° ) a Azimuth angle (° ) Azimuth angle (° ) 
>t (e) (022) . > (f) (202)) > (h) (122) 
*@ 2.04 | 20+ . "1 @ 2. | 
= = = = 
2 2 2 
©15 15 4 £1. 
mo} no} no} 
oO oO . oO . 
N 1.04 N 1.0 4 N41. 4 
o o . © 
Eos Eos J Eo. 4 
ie} fe} ' [} " 
z Zz z 
O.OL user irar tir iisiiiin 0.0 Dose tiritariiiaiiings I baat n n 
60 120 180 240 300 360 60 120 180 240 300 360 120 180 240 300 360 
Azimuth angle (° ) Azimuth angle (° ) Azimuth angle (° ) Azimuth angle (° ) 
25 1 1 ao 25 ot ooo 5,25 a — 
2 212) > = (k) (221) 
2 2.0 2 S 2.0F 4 
2 g = 
£ 15 & 1: 515 
ao] So] oO 
® oO N 
iN 1.0 iN H = 1.0 
© o 
£ = E 
5 0.5 5 i 8 O5b 
Zz z 
00 1 1 f : Lost n 0.0 ! ee 
60 120 180 240 300 360 180 240 360 60 120 180 240 300 360 
Azimuth angle (° ) Azimuth angle (° ) Azimuth angle (° ) 


Extended Data Figure 6 | Normalized intensities of diffraction peaks of the deformed bridgmanite. Each diffraction peak intensity was normalized 
by its average value. a-k, Diffraction peaks for the (111) (a), (020) (b), (120) (c), (210) (d), (022) (e), (202) (f), (113) (g), (122) (hh), (212) (i), (023) (j) and 


(221) (k) planes. 


© 2016 Macmillan Publishers Limited, part of Springer Nature. All rights reserved 


LETTER 


200 020 002 
ie, 
X 


Y 


Extended Data Figure 7 | Pole figures of bridgmanite showing the variation of the CPO of the [200], [020] and [002] directions determined using 
MAUD. mrd, multiples of a random distribution. 
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Extended Data Table 1 | Summary of the dominant slip system found for bridgmanite in various studies 
Dominant slip system Composition Pressure Temperature Methods 
This study [001](100) (Mgo.97Feo.03)Si03 25 GPa 1873 K Shear deformation 

Merkel et al. (2003) a Not observed MgsiO3 <32 GPa 300 K Uniaxial deformation by DAC 
[100],[010] and<110>  (Mg,Fe)SiO3 with 

Miyagi et al. (2016)* on (001) below 55 GPa, (Mg.Fe)O <65 GPa 300 K Uniaxial deformation by DAC 
(100) plane over 55 GPa 

Cordier et al. (2004)8 [100] MgSiO3 25GPa 300K -1673K USRE 

Mainprice et al. (2008)'* [010](100) MgSiO3 30 GPa OK FP with VPSC 
Ferré et al. (2007) [100](010), [010](100) MgSiO3 30 GPa OK FP 


USRE, Uniaxial stress relaxation experiments; FP, first principles; VPSC, visco-plastic self-consistent. 
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Extended Data Table 2 | Elasticity (C;) of the deformed bridgmanite aggregate at 25 GPa and 1,873 K 


i g9t: 922 G23 7-4 GS 756 


1 557.6 218.1 219.9 -0.5 0.2 2.8 
2 567.7 221.4 -0.4 -0.1 3.2 
3 572.2 -0.6 0.3 0.5 
4 174.4 1.3 0.1 
5 171.7 0.2 
6 171.8 


All values are in GPa. Parameters j and j indicate the reference axes: 1 corresponds to y (shear plane normal), 2 to x (shear direction) and 3 to z (perpendicular to both 1 and 2) in Fig. 2. 
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Extended Data Table 3 | Elasticity (C;) of a single crystal of a bridgmanite aggregate used to calculate the elasticity of deformed 
bridgmanite aggregate 


i gel g=2 ges foe Foo 56 
1 539 220 210 0 0 0 
2 595 232 0 0 0 
3 561 0 0 0 
4 187 0 0 
5 178 0 
6 156 


The data for the single crystal of bridgmanite are from ref. 19. All values are in GPa. Parameters i and j indicate the reference axes: 1 corresponds to [100], 2 corresponds to 
[010] and 3 corresponds to [001] in space group Pbnm. 
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Wild monkeys flake stone tools 


Tomos Proffitt!*, Lydia V. Luncz'*, Tiago Faldtico?, Eduardo B. Ottoni?, Ignacio de la Torre* & Michael Haslam! 


Our understanding of the emergence of technology shapes how 
we view the origins of humanity’. Sharp-edged stone flakes, 
struck from larger cores, are the primary evidence for the earliest 
stone technology’. Here we show that wild bearded capuchin 
monkeys (Sapajus libidinosus) in Brazil deliberately break stones, 
unintentionally producing recurrent, conchoidally fractured, 
sharp-edged flakes and cores that have the characteristics and 
morphology of intentionally produced hominin tools. The 
production of archaeologically visible cores and flakes is therefore 
no longer unique to the human lineage, providing a comparative 
perspective on the emergence of lithic technology. This discovery 
adds an additional dimension to interpretations of the human 
Palaeolithic record, the possible function of early stone tools, 
and the cognitive requirements for the emergence of stone 
flaking. 

Palaeoanthropologists use the distinctive characteristics of flaked 
stone tools both to distinguish them from naturally broken stones 
and to interpret the behaviour of the hominins that produced them‘. 
Suggested hallmarks of the earliest stone tool technology include 
(i) controlled, conchoidal flaking*;(ii) production of sharp cutting 
edges®; (iii) repeated removal of multiple flakes from a single core; 
(iv) clear targeting of core edges; and (v) adoption of specific flaking 
patterns’. These characteristics underlie the identification of intentional 
stone flaking at all early archaeological sites**”~'?, as they do not 
co-occur under natural geological conditions. 

To date, comparisons between hominin intentional stone flaking 
and wild primate stone tool use have focused on West African 
chimpanzees (Pan troglodytes verus)'*-'®. Nevertheless, stone breakage 
during chimpanzee tool use is accidental!°, a result of missed hits or 
indirect force application during activities such as nut-cracking. The 
resulting stone fragments lack most of the diagnostic criteria listed 
above for hominin flakes!®!”. Even when the manufacture of sharp 
edges was taught to captive bonobos (Pan paniscus), the resulting flaked 
assemblage did not replicate the early hominin archaeological record'®. 

The capuchins of Serra da Capivara National Park (SCNP) in Brazil 
use stone tools in more varied activities than any other known non- 
human primate, including for pounding foods, digging and in sexual 
displays'?-*!. Bearded capuchins and some Japanese macaques (Macaca 
fuscata) are known to pound stones directly against each other”’, but 
the SCNP capuchins are the only wild primates that do so for the 
purpose of damaging those stones!”. This activity, which we term stone 
on stone (SoS) percussion, typically involves an individual selecting 
rounded quartzite cobbles from a conglomerate bed (active hammers), 
and with one or two hands striking the hammer-stone forcefully and 
repeatedly on quartzite cobbles embedded within the conglomerate 
(passive hammers) (Fig. 1, Supplementary Video 1). 

Previous observations of capuchin stone percussion indicate that 
this behaviour occurs in an aggressive context”. In our observations, 
however, the monkeys licked or sniffed the crushed passive hammers 
in about half of the SoS percussion events’? (Supplementary Video 1), 
suggesting that they may be ingesting either powdered quartz or lichens. 
While the stones do not contain any biologically active components", 


silicon is known to be an essential trace nutrient”*. SCNP capuchins 
have also been seen to use a stone hammer to dislodge another stone 
from the conglomerate, with the second stone then used as a hammer 
for SoS percussion”. 

As well as deliberately crushing the surface of both the active and 
passive hammers, the capuchins regularly unintentionally fracture the 
stones during use (Supplementary Video 1). In addition, we observed 
a capuchin place a newly fractured stone flake on top of another stone, 
and then strike it with a hammer in a manner resembling chimpanzee 
nut-cracking or human bipolar reduction (Supplementary Video 1). 
Nevertheless, while the monkeys were seen to re-use broken 
hammer-stone parts as fresh hammers, they were not observed using 
the sharp edges of fractured tools to cut or scrape other objects. 

We collected fragmented stones immediately after capuchins 
were observed using them at the Oitenta site in SCNP (8° 52.394’ S, 
42° 37.971’ W) (Fig. 1), as well as from surface surveys and archaeolo- 
gical excavation in the same area (Extended Data Fig. 1). The assemblage 
consists of 111 capuchin-modified stone artefacts, including complete 


Figure 1 | Wild bearded capuchin SoS percussion, Serra da Capivara 
National Park, Brazil. a, The conglomerate outcrop where SoS percussive 
behaviour of b and ¢c was observed. b, c, SoS percussive actions including 
close observation by a juvenile capuchin (b), and stone breakage (c). 

Note that the active hammer in use is part of Refit Set 6 (Supplementary 
Information and Supplementary Video 1). 
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Figure 2 | Examples of flaked stones from capuchin SoS percussion. 

a, Detail of a large, unidirectionally flaked active hammer-stone, with 
clear impact marks located towards the centre of the striking platform. 

b, Refitted active hammer illustrating recurrent unidirectional removal of 
at least seven flakes (Refit Set 6; Extended Data Fig. 6b and Supplementary 


and broken hammer-stones, complete and fragmented flakes, and 
passive hammers. We also found flaked hammer-stones, which using 
a traditional classification would be considered flaked artefacts” 
(Extended Data Table 1). All stones were originally obtained by the 
capuchins from conglomerates in the vicinity of their use. 

Complete hammer-stones have a mean weight of 600.3 g (Extended 
Data Table 2a). They possess varying degrees of percussive damage 
across their surfaces, including small impact points surrounded by 
circular or crescent scars (Supplementary Information and Extended 
Data Fig. 2). Broken hammer-stones and flaked hammer-stones 
comprise over a quarter of the total assemblage. Broken hammer-stones 
are on average smaller than complete hammer-stones (mean = 203.8 g; 
Extended Data Table 2a), and some would be termed split cobbles in 
a hominin assemblage. Flaked hammer-stones exhibit one or more 
conchoidal or wedge flake scars, occurring either as 1-2 fortuitous 
scars from a natural striking platform, or as recurring unidirectional, 
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Video 2). c, e, Examples of conchoidal flakes. Artefact illustrations in e 
reproduced with permission from A. Theodoropoulou. d, f, Examples of 
flaked hammer-stones. a-f, Scale bars are 5 cm, except for the scale bar in 
the inset (a), which is 2mm. 


overlapping flakes resulting from repeated strikes on a fracture plane 
(Fig. 2, Supplementary Information and Extended Data Fig. 3). Refitted 
hammer-stones demonstrate this reduction sequence (Supplementary 
Information and Extended Data Figs 4, 5). Continuous rotation and 
manipulation of the hammer-stones during use also produces small 
(<1 cm), non-invasive, step-terminating flake scars along the edge 
of the striking platform, perpendicular to the flaking surface. These 
artefacts are indistinguishable from some archaeological examples 
of intentionally flaked early hominin stone cores. Using a traditional 
classification, the flaked hammer-stones fall within the morphology 
of unifacial choppers’. 

Complete flakes produced during SoS percussion have sharp edges, 
bulbs of percussion and scars from up to three previous flake removals 
(Fig. 2, Supplementary Information and Extended Data Fig. 6). 
A high proportion of wedge-initiated flakes occur in the early 
stages of reduction, evidenced by an increased frequency of cortical 
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Figure 3 | Examples of passive hammers from capuchin SoS percussion. 
a, b, Passive hammers with detail of percussive damage (inset). c, Passive 
hammer in situ at Serra da Capivara National Park, after its observed use 
for SoS percussive behaviour. Note the small flake fragments at the base 
of the passive element, resulting from active hammer flaking. a-c, Main 
scale bars are 5cm, the scale bars in the insets (a, b) are 1 cm. 


flakes. Conchoidal flakes, on the other hand, come from both early 
and later stages of reduction, with both cortical and non-cortical 
pieces represented. Extensive refits record the production of unidi- 
rectional recurrent, conchoidal flakes following an initial forceful 
fracture (Extended Data Figs 5, 6, Supplementary Information and 
Supplementary Video 2). 

Passive hammers, whether found detached from or embedded in 
the conglomerate, typically have a localized area of percussive damage 
located on a prominent surface (Fig. 3). The damage includes impact 
points, battering marks and crushed quartz crystals and, in some 
cases, detached flakes or chips. The passive hammers in this study 
(mean = 303.7 g, Extended Data Table 2a) also retain evidence of their 
subsequent re-use as active hammers, with impact points located on 
previously embedded flat planes opposite the passive hammer damage. 
This use clearly occurred after the stone was dislodged from the 
conglomerate. Capuchin SoS tools are therefore multifunctional, 
with the monkeys able to repurpose stones from a passive to an active 
percussive role (Supplementary Information). 

The distinctive assemblages found at SoS percussion sites will guide 
future archaeological investigations into the development of capuchin 
technology at SCNP”®, and the broader Middle Pleistocene dispersal of 
Sapajus into northeast Brazil’’. They should also assist in distinguishing 
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human tools from capuchin artefacts where the ranges of these primates 
overlap’”. Of interest beyond Sapajus behavioural evolution, SCNP 
capuchins produce stone debris through a similar technique (passive 
hammer) to that inferred from some of the earliest hominin archae- 
ological assemblages*!’. The passive hammer knapping technique 
involves striking a hammer-stone onto a passive anvil, with the desired 
flakes detached from the hand-held stone!! (Supplementary Video 1). 
Both active and passive hominin hammers often have repeated 
impact marks away from the tool's edge, interpreted as evidence of 
poorly controlled strikes or multi-purpose tool use’. SCNP capuchin 
behaviour demonstrates that these marks and recurrent conchoidally 
fractured, sharp-edged flakes, can be produced entirely unintentionally. 

The SCNP data provide an example of repeated conchoidal flaking 
that is not reliant on advanced, human-like hand morphologies and 
coordination”®, Similarly, SoS behaviour presents an alternative to 
evolutionary explanations that link the origins of recurrent flake 
production to a change in hominin cognitive skills**”*. In the absence 
of supporting evidence such as cut-marked bones, we suggest that 
sharp-edged flake production can no longer be implicitly or solely 
associated with intentional production of cutting flakes. Capuchin 
SoS percussion and simple Pliocene—Pleistocene stone knapping 
activities are equifinal behaviours in the production of flaked lithic 
assemblages. These findings open up the possibility that unintentional 
flaked assemblages may be identified in the palaeontological record of 
extinct apes and monkeys. In light of this possibility, criteria commonly 
used to distinguish intentional hominin lithic assemblages need to 
be refined. 

No living primate is a direct substitute for extinct hominins, 
which varied in unknown ways from the behaviour, cognition and 
morphology seen in extant animals and humans!°. However, capuchin 
SoS percussion is an example of intentional stone breakage by a 
non-human primate that produces concentrated lithic accumulations. 
Capuchin SoS percussion flakes and flaked hammer-stones fall within 
the range of mean dimensions for simple flakes and cores from the 
Early Stone Age* (Supplementary Information and Extended Data 
Table 2b). If encountered in a hominin archaeological context, this 
material would be identified as artefactual, potentially interpreted as 
the result of intentional stone fracture and controlled flake production, 
and probably attributed to functional needs requiring the use of 
sharp edges. 

The capuchin data add support to an ongoing paradigm shift in our 
understanding of stone tool production and the uniqueness of hominin 
technology. Within the last decade, studies have shown that the use*” 
and intentional production? of sharp-edged flakes is not necessarily 
tied to the genus Homo. Capuchin SoS percussion goes a step further, 
demonstrating that the production of archaeologically identifiable 
flakes and cores, as currently defined, is no longer unique to the human 
lineage. 

Online Content Methods, along with any additional Extended Data display items and 


Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


The SoS percussion assemblage included 111 artefacts collected from surface and 
archaeological capuchin activity locations in Serra da Capivara National Park 
(SCNP), Plaui, Brazil. The surface collection (Lasca OIT surface; n = 60, 54.1%) 
was produced by capuchins observed performing SoS percussion in September 
2014, at a site later designated Lasca Oitente 2 (Lasca OIT 2). The capuchins belong 
to the Jurubeba group, which was first studied in March 2004 (ref. 20). SoS activity 
primarily took place on a low (approximately 1 m high), narrow conglomerate 
ridge associated with a much larger conglomeratic outcrop (Fig. 1; Supplementary 
Video 1). During this time a portion of the used assemblage dropped to the ground 
immediately below the activity area, and was collected once the activity ceased. 
Additional material was collected during surface surveys within the immediate 
vicinity of Lasca OIT 2, at locations where isolated conglomerate blocks were used 
by the same capuchin group for SoS percussion. This material was also analysed 
as Lasca OIT surface. 

The archaeological material comes from two excavations conducted in June 
2015 (Extended Data Fig. 1), within the Jurubeba group range: Lasca OIT 1 
(8° 52.460’ S, 42° 37.977' W) and Lasca OIT 2 (8° 52.394’ S, 42° 37.971’ W). We 
excavated both sites by hand in 5-cm levels, and sieved all sediment through a 
5mm mesh. Sediments at both sites consisted of light-brown, silty sand, with 
gravel to cobble-sized inclusions, resulting from the in situ weathering of local 
conglomerates. We distinguished capuchin tools from natural stones on the basis of 
percussion marks and flaking features as described in the main text and below. The 
Lasca OIT 2 excavation (Extended Data Fig. 1b) can be considered an extension of 
the surface material collected in 2014 from the same site. An area of 3m? excavated 
to a maximum depth of 0.5 m yielded 28 SoS percussion artefacts (25.2%) at Lasca 
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OIT 2. We excavated Lasca OIT 1 (Extended Data Fig. 1a), located 120 m southwest 
of Lasca OIT 2, beneath the sheer face of an approximately 7 m high conglomerate 
outcrop that showed percussion marks indicative of previous SoS activity. A total 
excavated area of 3m? to a maximum depth of 0.4m yielded 23 artefacts (20.7%) 
at this site. We did not find human material, such as hearths, ceramic pieces, metal 
objects, or ground stone at either site. Such items are ubiquitous in anthropogenic 
sites elsewhere in SCNP*!. This absence, along with direct observation of capuchins 
creating the flaked surface assemblage, and the identical nature of the damage and 
size of the recovered stones to those observed in use by capuchins, rules out human 
production of the archaeological material. 

No statistical methods were used to predetermine sample size. The experiments 
were not randomized. The investigators were not blinded to allocation during 
experiments and outcome assessment. 

We identified the raw material of each artefact and performed technological 
classification and analysis following commonly used technological attributes”?*?**. 
For full details and definitions of the technological categories used in this analysis, 
see the Supplementary Information. All data are available upon request. 
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Extended Data Figure 1 | Archaeological excavation of wild capuchin OIT2 excavation, note the low conglomerate ridge to the left, on which 
SoS percussion sites, Serra da Capivara National Park. a, Lasca OIT1 capuchins were observed whilst performing SoS activities. Scale bar, 
excavation, each square is 1 x 1m. b, The approach to Lasca OIT2, 30cm (see also Fig. 1). 


which is located to the right of the conglomerate cliff face. c, Lasca 
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Extended Data Figure 2 | Examples of active hammers. a, Crushing impacts on multiple surfaces of an active hammer. b, Examples of impact points 
and associated circular hertzian fractures on the surface of an active hammer. Scale bars are 5 cm, except for inset scale bars, which are 2mm. 


© 2016 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


Extended Data Figure 3 | Examples of SoS flaked hammer-stones. a, c, Flake detachment following a transverse active hammer fracture. 
b, Unintentional radial reduction of flaked hammer-stone. d-f, Examples of complete active hammers with scars of accidental flakes. Scale bars 
are 5cm. 
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Direction of refit flake 

Direction of flake scar 

Refitted piece 

Flake scar 

Impact point associated with flake scar 


Impact point associated with reffited flake 


Extended Data Figure 4 | Refits of flaked hammer-stones showing the Set 4 (artefact numbers JF3 and JC5). A, A2, B and C are designated planes 
repeated detachment of unidirectional flakes. a, Refit Set 1 (artefact on each refit, corresponding to descriptions found in Supplementary 
numbers JC13 and JF7). b, Refit Set 2 (artefact numbers 225102a and Information. Scale bars are 5 cm. 

225102b). c, Refit Set 3 (artefact numbers 224881a and 224881b). d, Refit 
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Direction of refit flake 


Direction of flake scar 


Refitted piece 


Flake scar 
Impact point associated with flake scar 


Impact point associated with reffited flake 


Extended Data Figure 5 | Refits of flaked hammer-stones showing the (See also Supplementary Video 2). c, Refit Set 7 (artefact numbers 
repeated detachment of unidirectional flakes and continued use of JC4 and JC10). A, A2, B, B2, C and C2 are designated planes on each refit, 
broken active hammers. a, Refit Set 5 (artefact numbers JC11, JC12, JF23 corresponding to descriptions found in Supplementary Information. Scale 
and JF1). b, Refit Set 6 (artefact numbers JC6, JF2, JF14, JF4 and JF8) bars are 5cm. 
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Extended Data Figure 6 | Examples of complete flakes. a-f, Examples of complete flakes detached during capuchin SoS percussion. Scale bars are 
incm. Scale bars are 5cm. 
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Extended Data Table 1 | Absolute and relative frequencies and total weights (g) of technological categories identified in each Capuchin SoS 
assemblage, Serra da Capivara National Park 


Lasca OT 8=Lasca Oi Lasca ON 2 LascaOTi Lasca ONT 2 
Surface Excavation Excavation Excavation Excavation 


Total Weight (g) 
Technological Category 


Complete Hammerstone 
Broken Hammerstone 
Flaked Pieces 

Complete Flake 
Fragmented Flake 
Chunk 


Small Debris 


N 
0 
4 
D 
.~] 
2 
B 
2 
0 


Passive Element 
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Extended Data Table 2 | Dimensional data for all artefacts from Capuchin SoS assemblages and a comparison with 
Pliocene-Pleistocene hominin artefacts 


Lasca OMT 2 Excavation 


A Assemblage 
Technological Lasca OMT Surface Lasca OMT 1 Excavation 

Category Measure Min Max Mean StDev_ Min Max Mean StDev Min 
Hammerstone Max Length (mm) = = - - 75.00 129.00 101.15 19.35 61.70 
Max Width (mm) - - - - 5390 93.70 77.16 12.02 4880 

Max Thickness (mm) - a - - 3970 7490 54.63 11.05 40.00 
Weight (9) - - - - 261.50 92420 575.30 231.89 155.40 

Broken Max Length(mm) 4816 92.58 77.33 20.24 7510 10170 8840 18.81 50.90 
Hammerstone Max Width(mm) 30.91 61.66 43.65 13.92 4360 67.10 55.25 16.62 2800 
Max Thickness (mm) 1887 42.87 34.46 11.03 4210 57.60 49.8 10.96 14.00 

Weight (9) 37.80 286.30 174.48 102.87 187.40 383.00 285.20 138.31 2690 

Flaked Pieces Max Length(mm) 4297 81.52 64.81 12.22 8550 13750 101.23 24.45 43,00 
Max Width (mm) 3552 57.01 44.98 694 4130 99.50 67.08 24.52 3200 

Max Thickness (mm) 23.38 45.29 35.79 848 3390 77.80 56.33 1811 2250 

Weight (9) 48.90 247.00 129.69 53.44 13670 1101.00 538.40 412.77 6260 

Complete Flakes = Max Length(mm) 1480 70.98 34.95 1669 2990 63.60 44.34 11.17 3020 
Max Width (mm) 600 45.70 2248 11.72 2370 44.70 31.54 802 1650 

Max Thickness (mm) 160 27.56 12.03 7.34 690 249 1357 654 11.10 

Weight (9) 40 44.70 15.18 1615 480 59.20 23.87 23.79 990 

Fragmented Flakes Max Length(mm) 13.04 41.75 21.32 949 7930 79.30 79230 - = 
Max Width (mm) 705 24.42 13.37 493 2860 2860 2860 - = 

Max Thickness(mm) 371 1992 696 464 1460 1460 1460 - = 

Weight (g) 40 26.00 348 718 2700 27.00 27.00 - = 

Chunk Max Length(mm) 1263 S7.31 26.74 1333 - « - - 70.40 
Max Width (mm) 989 55.32 1948 1239 - - - - 3800 

Max Thickness(mm) 7.01 33.54 1360 738 - s : - 25.00 

Weight (9) 70 83.00 13.72 2410 - - - - 51.10 

Small Debris Max Length(mm) 1403 14.38 14.21 25 7 2 - 2 - 
Max Width (mm) 622 8% 7.29 151 - - - - - 

Max Thickness(mm) 523 811 667 204 - = = = - 

Weight (9) 6 80 70 14 = : : 2 = 

Passive Hammers Max Length (mm) : - - - 9020 90.20 90.20 - 853 
Max Width (mm) z - 2 - 6870 6870 6870 - 537 

Max Thickness (mm) - - : - 5050 50.50 5050 - 381 

Weight (g) - - 7 - 39830 39830 39830 - 2091 

Natura¥Unmodified Max Length (mm) = _ 2 = es 4 = = 84.20 
Max Width (mm) = = * - = = = - 7830 

Max Thickness (mm) - 5 = = = = = 35.70 
Weight (g) 7 - - = 7 - 7 - 271.50 

B Length (mm) Width (mm) 
Site a N Mean Std Min Max Mean Std Min 
Fakes 

LOM3 33 26 120.0 4880 19 205 1101 40.70 19 

OGS7 26 #73 391 1430 13 80 37.1 1410 13 

EG10 26 «114 «9374 1534 14 78 346 13.74 14 

EG12 26 62 345 1284 15 6 356 1323 19 

ALS 2% 104 359 2363 6 14 251 1757 2 

LA2C 2% 500 380 15.00 12 9 350 1400 7 

Omos7 24% 44 248 1055 10 58 204 685 10 

Omo123 23% 110 208 750 7 50 178 649 6 

OK >184 115 40.2 1480 18 110 (37.4 11.22«17 

FLKZinj af 125 368 1213 16 82 329 1159 4 

SCNP WA 31 335 15.80 148 70.98 265 1242 6 

Cores 

LOM3 33 83 167.0 2340 132 260 1478 23.10 9% 

OGS7 26 7 441 1368 2 67 590 854 45 

EG10 26 16 833 1034 © 105 609 918 44 

EG12 26 7 745 872 58 93 597 806 49 

AL894 2% 38 75.0 30.32 1931 1363 553 254 1221 

LA2C 2% 70 660 1800 3% 13 520 1400 2 


a, Dimension data for all technological categories identified in this study. b, Metric comparison of SCNP capuchin SoS percussion flakes and flaked hammer-stones with hominin 


Pliocene—-Pleistocene flake and core dimensions. Data and table adapted from Harmand et al. (2015). 
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Evolution of Hoxall regulation in vertebrates is 
linked to the pentadactyl state 


Yacine Kherdjemil!?, Robert L. Lalonde*, Rushikesh Sheth!, Annie Dumouchel!, Gemma de Martino'}, Kyriel M. Pineault*, 
Deneen M. Wellik*, H. Scott Stadler®, Marie- Andrée Akimenko? & Marie Kmita)?:® 


The fin-to-limb transition represents one of the major vertebrate 
morphological innovations associated with the transition from 
aquatic to terrestrial life and is an attractive model for gaining 
insights into the mechanisms of morphological diversity between 
species’. One of the characteristic features of limbs is the presence 
of digits at their extremities. Although most tetrapods have 
limbs with five digits (pentadactyl limbs), palaeontological data 
indicate that digits emerged in lobed fins of early tetrapods, 
which were polydactylous”. How the transition to pentadactyl 
limbs occurred remains unclear. Here we show that the mutually 
exclusive expression of the mouse genes Hoxal11 and Hoxa13, 
which were previously proposed to be involved in the origin of the 
tetrapod limb’, is required for the pentadactyl state. We further 
demonstrate that the exclusion of Hoxal11 from the Hoxa13 domain 
relies on an enhancer that drives antisense transcription at the 
Hoxa11 locus after activation by HOXA13 and HOXD13. Finally, 
we show that the enhancer that drives antisense transcription of 
the mouse Hoxa11 gene is absent in zebrafish, which, together with 
the largely overlapping expression of hoxa11 and hoxa13 genes 
reported in fish*’, suggests that this enhancer emerged in the course 
of the fin-to-limb transition. On the basis of the polydactyly that we 
observed after expression of Hoxa11 in distal limbs, we propose that 
the evolution of Hoxa11 regulation contributed to the transition 
from polydactyl limbs in stem-group tetrapods to pentadactyl limbs 
in extant tetrapods. 

Several studies provided evidence for the implication of Hox genes in 
the fin-to-limb transition®, notably Hoxa13 and Hoxd13 (Hox13 here- 
after), which are required for digit morphogenesis!” !*, Comparison of 
their expression pattern in fin and limb buds revealed a significant 
expansion of the Hox13 domain in distal limbs!° and engineered 
enlargement of the Hoxd13 domain in fish resulted in more chondro- 
genic tissue forming distally as well as fin fold reduction'’—that is, 
morphological changes associated with the fin-to-limb transition. It 
was thus proposed that the evolution of Hox 13 regulation has likely 
been instrumental to the emergence of the limb characteristic feature, 
that is, the digits! 7, In mice, this regulation relies on a series of remote 
transcriptional enhancers!®!”, and although a subset of these enhanc- 
ers exists in fish!®, the expansion of the Hox13 domain in limb was 
probably associated with the emergence of tetrapod-specific enhanc- 
ers during the fin-to-limb transition'”"'°. Another notable difference 
is the mutually exclusive expression of Hoxal1 and Hoxa13 in tetra- 
pod limbs, contrasting with their largely overlapping expression in 
fins*-’. Two hypotheses have been put forward to explain how Hoxa11 
gets proximally restricted in tetrapod limbs. One hypothesis sug- 
gested a Hoxa13-dependent repression of Hoxa11 in the presumptive 
autopod*'*!°, whereas the second proposed that antisense transcrip- 


tion at the Hoxa11 locus prevents expression of the gene distally”, 


but the functional importance of the mutually exclusive expression of 
Hoxa11 and Hoxa13 in tetrapod limbs is unknown. 

Previous chromatin conformation analyses revealed that, in distal 
limbs, 5’ HoxA genes (that is, Hoxa9 to Hoxa13) are grouped within a 
chromatin sub-topological domain (sub-TAD) interacting with sub- 
TADs containing distal limb enhancers”. Yet, although Hoxa10 and 
Hoxa13 are both expressed distally, Hoxa11 expression is proximally 
restricted (Fig. la—c), suggesting that Hoxa11 is part of the distal limb 
regulatory landscape, but a specific, yet unknown, mechanism pre- 
vents its expression distally'*’”. To test this possibility, we first took 
advantage of a mouse line in which the Hoxa11 gene is replaced by 
a PGK-neomycin resistance cassette?>, which we used as a reporter 
transgene. We found neomycin expression in distal limbs (Fig. 1d), 
indicating that Hoxal11 proximal restriction is linked to specific fea- 
ture(s) of the gene itself. We next analysed the putative implication of 
antisense long non-coding RNAs previously identified at the Hoxa11 
locus*®*! and robustly expressed in the distal limb bud” (Fig. le). 
Among the distinct Hoxal11 antisense transcripts (Hoxa11as, also 
known as Hoxa11os), two initiate upstream of the Hoxal1 gene and 
are thus non-overlapping with Hoxa11 (Hoxal1 1las-a; Fig. le) and the 
other two initiate within Hoxa11 exon 1 (Hoxa1 las-b; Fig. 1f). Notably, 
only Hoxa11as-b expression pattern is mutually exclusive with Hoxa11 
expression domain (Fig. 1f, compare with Fig. 1b). To test whether anti- 
sense transcription overlapping with Hoxa11 exon 1 prevents Hoxa11 
expression distally, we took advantage of the Hoxa11°°"” mutant line, 
which lacks Hoxa1 1las-b start sites as the enhanced green fluorescent 
protein (eGFP) coding sequence replaces most of Hoxa11 exon 1 
(ref. 24). This mutation disrupted antisense transcription normally 
initiating 3’ to Hoxa11 promoter (Extended Data Fig. 1a, b) while gfp 
expression driven by the Hoxa11 promoter was present both in the 
proximal and distal domains (Fig. 1g). By contrast, ectopic expression 
of Hoxa1 1as-b in the entire limb had no effect on Hoxa11 expression 
(Extended Data Fig. 2c-e), thereby excluding a trans-acting effect of 
Hoxallas-b on Hoxal11 expression. Together, our data suggest that 
Hoxa11 distal repression is due to the antisense transcription event or 
the antisense Hoxa1 las-b transcripts acting in cis. 

Previous mapping of active enhancers in distal limbs'” (referred to 
as ‘digit’ enhancers hereafter) uncovered a putative ‘digit’ enhancer 
embedded in Hoxa11 intron. We thus proposed that this enhancer 
might control Hoxa11as-b expression. We first tested the transcrip- 
tional enhancer activity of this DNA region in transgenic embryos 
and confirmed its ability to act as a transcriptional enhancer in distal 
limbs (Fig. 2a). Next, we generated mutant mice lacking this enhancer 
(Hoxa11“!"/4l. Extended Data Fig. 2) to examine its potential impli- 
cation in Hoxa1 1as-b expression. Analysis of antisense transcription in 
Hoxa11~!""“"" limbs showed no detectable expression of Hoxal1as-b 
in the most distal cells (Fig. 2b, c), indicating that in these cells, the 
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Figure 1 | The proximal restriction of Hoxa11 is linked to antisense 
transcription at the Hoxa11 locus. a—c, Expression of Hoxa10 (a), Hoxa11 
(b) and Hoxa13 (c) in wild-type limb bud from embryonic day (E) 11.5 
mouse. d, Replacement of the Hoxal1 gene with the PGK-neomycin 
cassette (Hoxa11'®; scheme to the left), results in neomycin expression 
both in the proximal and distal domains. e, f, Expression of all antisense 
transcripts (e) and antisense transcripts overlapping with Hoxa11 exon 1 
(f) in E11.5 wild-type limb. Schemes of the antisense transcripts and the 
probes used (blue boxes) are on the left. Note that the antisense transcripts 
overlapping with Hoxa11 exon 1 (Hoxa11as-b) are distally restricted (f), 
reminiscent of Hoxa13 expression (c) and mutually exclusive with the 
Hoxa11 pattern (b). g, Deletion of the antisense transcript start sites in 
Hoxa11 exon 1, via replacement of most of exon 1 with the eGFP coding 
sequence (Hoxa11°°""; scheme to the left) and expression of gfp under the 
control of the Hoxa11 promoter (right). Original magnification, x 31.5 
(for all images). 
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identified enhancer is required for antisense transcription overlapping 
with Hoxa11 exon 1. Some Hoxa11as-b expression remained in prox- 
imal cells of the presumptive handplate (presumptive carpal region; 
Fig. 2c), which suggests that additional cis-regulatory element(s) trig- 
ger antisense transcription in these cells. Notably, the deletion of the 
enhancer abrogating Hoxa11as-b expression in the most distal cells 
also resulted in ectopic expression of Hoxa11 in the presumptive digits 
(Fig. 2d, e). The gain-of-sense transcription in Hoxa11°CP?/*CF? distal 
limbs (Fig. 1g) indicates that it is not the intronic regulatory region per 
se but Hoxa11as-b expression or the antisense transcription event that 
represses Hoxa11 expression distally. 

Analysis of the enhancer sequence revealed several putative bind- 
ing sites for HOXA13, the expression of which occurs in digit pro- 
genitor cells” and is required in conjunction with HOXD13 for digit 
morphogenesis'*. Chromatin immunoprecipitation followed by high- 
throughput sequencing (ChIP-seq) indicated that, in distal limb cells, 
HOXAI13 as well as HOXD13 bind to the identified enhancer (Extended 
Data Fig. 3a). Moreover, transcription assay in 293T cells shows 
that HOXA13 has a positive effect on the enhancer activity (Extended 
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Figure 2 | Deletion of the distal enhancer in Hoxa11 intron results in 
impaired antisense transcription and gain of sense transcription in 
distal cells. a, Left, scheme of the Tg(m-Intal 1-LacZ) transgene carrying 
the predicted distal enhancer (Int, red box). Right, X-gal staining of E12.5 
transgenic embryos (n =5). b-e, Expression of Hoxa1 las-b (b, c) and 
Hoxa11 (d, e) in wild-type (WT; b, d) and Hoxa1 144" (c, e) mouse 
limbs at E12.5. Note that based on the observed gain of Hoxa11 expression, 
other regulatory input(s) could be implicated in Hoxa11 regulation in 
distal cells. Pr, minimal promoter. Original magnification, x 31.5 (for all 
images). 


Data Fig. 3b). Together, these results raised the possibility that 
distal Hoxa11 antisense transcription relies on HOX13. We thus 
analysed Hoxa11 antisense transcription in the Hoxa13;Hoxd13 allelic 
series. We used the probe recognizing all antisense transcripts such 
that expression in the proximal limb, where Hox13 genes are not 
expressed, served as internal control. We found that although anti- 
sense transcription is barely modified in single mutants (Extended 
Data Fig. 4), it markedly decreases in the Hoxa1l3~/~ Hoxd13+!— 
mutant (Fig. 3c, compare to Fig. 3a), and is completely abrogated in 
Hoxa13~/~ Hoxd13~’~ distal limbs (Fig. 3e). Analysis of the distal- 
specific antisense transcripts (Hoxal las-b) confirmed that distal anti- 
sense transcription requires HOX13 function (Extended Data Fig. 5). 
Importantly, concomitant with the abrogation of antisense transcrip- 
tion, Hoxal1 expression was gained distally (Fig. 3d—-f, compare with 
Fig. 3b) consistent with the requirement of antisense transcription for 
Hoxa11 proximal restriction. 

To assess the functional significance of the HOXA13/D13-mediated 
repression of Hoxa11, we investigated the phenotypic outcome of dis- 
tal Hoxa11 expression. Although the deletion of the enhancer driving 
antisense transcription results in Hoxal1 expression in distal limbs, 
the deletion extends up to the exon 1-intron boundary, thereby pre- 
cluding the use of this mutant line to assess the phenotype resulting 
from distal Hoxa11 expression. We thus generated a Hoxa11 condi- 
tional gain-of-function allele (Rosa264#*""!; Extended Data Fig. 6) to 
express Hoxa11 ectopically and distally. We found that embryos carry- 
ing the Rosa26%*"! allele and either Hoxa13:Cre (ref. 25) or Prx1:Cre 
(ref. 26) have limbs with extra digits (Fig. 3g, h), including postaxial 
extra digits (arrow in Fig. 3h and Extended Data Fig. 7). While some 
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Figure 3 | Hox13 inactivation disrupts Hoxa11 antisense transcription 
in distal cells and distal Hoxa11 expression results in the formation 

of supernumerary digits. a-f, Hoxal las (probe A) (a, c, e) and 

Hoxa11 (b, d, f) expression in E11.5 limb buds from wild-type (a, b), 
Hoxa13~/~ Hoxd13*'~ (c, d) and Hoxa13~/~ Hoxd13~‘~ (e, f) mouse 
embryos. Arrows in c and d show the group of cells still expressing 
Hoxa11as in Hoxa13~/~ Hoxd13+!~ limbs (c), which corresponds to 
distal cells in which Hoxa11 expression is not gained (d). g, h, Skeleton of 
Rosalexa/Hoxall (g) and Prx1Cre; Rosa’t*4!!/4x41! (h) distal forelimb at 
postnatal day 0 (PO). Anterior is up. Original magnification, x 31.5 (a-f) 
and x20 (g, h). 


variations in the digit phenotype were observed among individuals, 
all homozygous mutants analysed were polydactylous (Extended Data 
Fig. 7c-e). Increased expression of Hoxd11 in the presumptive auto- 
pod in the absence of Hoxd13 also resulted in polydactyly, whereas 
a similar gain of Hoxd10 or Hoxd12 had no effect on digit number’. 
These data raise the possibility that the formation of extra digits upon 
ectopic expression of Hoxa11 or Hoxd11 distally reflects the divergence 
between Hoxa11/Hoxd11 targets and those of the other 5’ HoxA/D 
genes. Notably, the evidence that Hoxa11 expression in the distal limb 
results in the formation of extra digits indicates that the proximal 
restriction of Hoxa11 expression is required for the pentadactyl state. 
In contrast to the mutually exclusive Hoxal1 and Hoxa13 pattern 
in tetrapod limbs, hoxal1 and hoxa13 gene expression is largely 
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Figure 4 | The mouse Hoxa11 antisense enhancer is functional in distal 
fins. a, mVISTA sequence conservation plot of the mouse Hoxa11 intron 
(red) with tetrapod (rat, human, chicken and frog) and fish representatives 
(coelacanth and zebrafish). Ex1, exon 1; Ex2, exon 2. Note that zebrafish 
has two hoxa11 genes expressed in developing fins, hoxal1a and hoxa11b. 
b, c, GFP expression in fin buds of Tg(m-Inta11-eGFP) transgenic 
zebrafish embryos at 60 hpf (b) and 72 hpf (c), revealing the enhancer 
activity of the mouse Hoxa11 intron in fish. Note the filopodia-like 
protrusions in GFP* mesenchymal cells suggestive of a migration towards 
the fin fold. d, e, hoxa13a expression in developing fins at 60 hpf (d) and 
72 hpf (e). Original magnification, x 400. 


overlapping in zebrafish fins*-” (Extended Data Fig. 8) as well as in 
other teleosts”® (the medaka Oryzias latipes) and in fish models of 
both chondrichthyans? (Scyliorhinus canicula) and basal actinoptery- 
gians? (Polyodon spathula). The HOXA13/D13-mediated repression of 
Hoxa11 identified in distal limb cells was thus probably implemented 
after the separation of actinopterygians and chondrichthyans, during 
the evolution of vertebrates towards tetrapod species. Consistent with 
this hypothesis, no hoxa11 antisense transcription has been reported in 
fish?” (Extended Data Fig. 9). Moreover, sequence comparison of the 
mouse Hoxa11 intron showed robust conservation among tetrapods, 
whereas considerably weaker sequence conservation was observed 
with fish hoxal1 orthologues (Fig. 4a). To examine whether the lack 
of hoxa11 antisense transcription in fish could be due to the absence of 
a distal enhancer within hoxa11 intron, we tested the zebrafish hoxalla 
and hoxa1 1b intronic sequences for potential enhancer activity using 
transgenic reporter assays in both zebrafish and mice. Neither the 
hoxa1la nor hoxa11b intron was capable of triggering expression of a 
reporter gene in fin nor in mouse limb buds (Extended Data Table 1), 
indicating that there is no distal enhancer in hoxalla nor hoxal1b 
intron. By contrast, when we tested the transcriptional activity of the 
mouse Hoxa11 intron in zebrafish, the analysis of four stable trans- 
genic lines revealed that the mouse Hoxa11 intron was able to drive 
reporter gene expression in the pectoral fin mesenchyme (Fig. 4b, c). 
At 60 hours post-fertilization (hpf), eGFP-positive cells were present 
at the distal rim of the endoskeletal disc and migrating into the fin fold 
(Fig. 4b) and by 72 hpf most eGFP-positive cells were found in the fin 
fold mesenchyme (Fig. 4c). The expression of the reporter transgene 
was reminiscent of hoxa13a expression at 60 hpf (Fig. 4d) and 72 hpf 
(Fig. 4e), indicating that the mouse enhancer in Hoxa11 intron was 
active in the Hoxa13 domain also in zebrafish. Together, our data indi- 
cate that all the transcription factors required for the activity of the 
mouse enhancer are present in zebrafish fins, and that the enhancer 
driving Hoxa11 antisense transcription does not exist in the intron of 
the zebrafish hoxal1a and hoxa11b genes. We therefore propose that 
the emergence of the enhancer triggering Hoxa11 antisense transcrip- 
tion, and thus distal repression of Hoxa11, occurred in the course of 
evolution towards tetrapod species. 

In summary, our work reveals that the mutually exclusive expression 
of Hoxa11 and Hoxa13 in tetrapods is associated with the emergence 
of a transcriptional enhancer in Hoxa11 intron, which upon HOXA13/ 
D13-dependent activation, triggers antisense transcription and thereby 
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prevents Hoxa11 expression distally. On the basis of the evidence that 
this HOX13-mediated regulation of Hoxal1 probably emerged dur- 
ing the fin-to-limb transition and the polydactyly resulting from distal 
expression of Hoxa11 in mice, we propose that the evolution of Hoxa11 
regulation has contributed to the transition from polydactyly in stem- 
group (extinct) tetrapods to pentadactyly in extant tetrapods. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 

No statistical methods were used to predetermine sample size. 

Mouse lines. Hoxa1 1%, Hoxa11°°"?, Hoxa13null (Hoxa13*"") and Hoxd13null 
(Hoxd13') mouse lines were previously described!*7>*4, 

Rosa™°*1! knock-in allele was constructed as followed: PacI-Ascl frag- 
ment from pBTG (Addgene plasmid 15037)*! was inserted into the previ- 
ously described Rosa26 targeting vector** pROSA26Am1 (Addgene plasmid 
15036)*!. The mouse Hoxal1 cDNA was inserted at the Smal site within the 
MCS. The vector was linearized by Swal digest prior electroporation into 
embryonic stem (ES) cells. After double selection using G418 and DTA nega- 
tive selection, 96 ES cell clones were analysed by Southern blot for homolo- 
gous recombination. Two independent clones were injected into blastocysts 
obtained from C57BL/6] mice, subsequently implanted into pseudo-pregnant 
females. After germline transmission of the Rosa!°**"" allele, mice and 
embryos were genotyped by Southern blot (a scheme with restriction sites and 
probes used is presented in Extended Data Fig. 6) and PCR. The following 
PCR primers were used: fw_wt : 5’/-GCAATACCTTTCTGGGAGTTCT-3’, 
rev_wt : 5/-TCGGGTGAGCATGTCTTTTAATC-3’, rev_flox : 5’-TTCAATGGCC 
GATCCCATATT-3’, rev_del : 5‘-AGGTTGGAGGAGTAGGAGTATG-3’. Wild- 
type band: 384 bp, flox band: 881 bp, del band: 583 bp. The moderate transcription 
resulting from the Rosa26 promoter allowed for Rosa26'**!! expression at a level 
comparable to the Hoxa11 gain observed in our series of mutants. 

Hoxa11“"" mouse line was generated through pronuclei injection of single-guide 
RNAs ( sgRNAs). We used the CRISPR (http://crispr.mit.edu/) platform to design 
sgRNAs flanking the region to delete. Complementary strands were annealed, phos- 
phorylated and cloned into the BbsI site of pX330 CRISPR/Cas9 vector (Addgene 
plasmid 42230)**. SgInt1_fw : 5’-CACCGACTCCCCTTTCATAAAGCCC-3'; 
SgIntl_rev : 5‘-AAACGCGCTTTATGAAAGGGGAGTC-3’; SgInt2_fw : 
5'-CACCGAGCAACAGGCGAGTTTGCGC-3’; SgInt2_rev : 5‘-AAACGCGC 
AAACTCGCCTGTTGCTC-3’. Mice and embryos were genotyped by Southern 
blot (a scheme with restriction sites and probe used is presented in Extended 
Data Fig. 2) as well as PCR. The Southern blot probe corresponds to the Scal- 
Hpal fragment in the 3’ untranslated region (UTR) of the Hoxa11 gene. Primers 
used for PCR genotyping, fw: 5’-GGCCACCTAAGGAAGGAGAG-3’; rev: 
5!-GGCTCCGGTGCGTATAAAG-3’ 

Three Prx1-Hoxa1 las transgenic lines were derived from three distinct found- 
ers obtained from pronuclear injection of the Prx1-Hoxa1las transgene. The 
Prx1-Hoxal las transgene carries the Prx1 promoter upstream of the mouse 
Hoxallas (GenBank: U20367.1 and U20366.1) and the SV40 polyadenylation 
sequence was inserted downstream Hoxal1 las. Embryos were genotyped by PCR 
using DNA from the amniotic membrane and the following pair of primers: fw: 
5!-CTTTCTCTCTGGCTCTGATG-3’ and rev: 5/-GACAAGAACGCCGAGAA-3/ 
(for U20367.1) or fw: 5/-GTCCGAGGAAAAGGAGGTAG-3’ and rev: 
5'-GCTCCTCTAACATGTATTTG-3’ (for U20366.1). 

All mice were of mixed background (C57BL/6 X 129). 

The Tg(m-Intal 1-LacZ) transgene was generated by subcloning the mouse 
Hoxa11 intron upstream of the Hbb (8-globin) minimal promoter and a 
LacZACpG NLS reporter. The H19 insulator was inserted upstream of the 
Hoxa11 intron. Tg(m-Intal1-LacZ) embryos were produced by pronuclear 
injection. 

Whole-mount in situ hybridization, X-gal staining, skeletal preparations and 
imaging. For skeletal preparation, newborn mice were processed using the stand- 
ard alcian blue alizarin red staining protocol** (n= 10 for each genotype). 

Whole-mount in situ hybridizations were performed using previously described 
protocol®? and probes* (gfp*®, Neo, Hoxal1, Hoxa13). Embryos were genotyped 
prior in situ hybridization (no blinding). Hoxa1las probes were generated using limb 
cDNA and the following primers: fw 5/-AGAGGCGCTGAGGAGCCTTCTC-3’ 
and rev 5'/-GGCCGCTGTGGACACTAGCATATACC-3’ (probe A); fw 5’-CCT 
TCTCGGCGTTCTTGTC-3’ and rev 5/-GGCATACTCCTACTCCTCCAACCT-3’ 
(probe B). 

X-gal staining was performed using standard protocol**. Embryos were geno- 
typed after X-gal staining (which results in blinding test). 

All mouse specimens were imaged using the Leica DFC450C camera. For each 
experiment, a minimum of three embryos per genotype was used as we considered 
that reproducible staining/expression patterns with three distinct embryos of the 
same genotype are significant. The experiments shown were repeated at least twice. 
We did not use the randomization method. 

Subcloning of zebrafish hoxa11a/b intron and microinjections in zebrafish 
embryos. The zebrafish hoxal11a (713 bp; gene ID 58061, NCBI) and hoxal1b 
(747bp; gene ID 30382, NCBI) introns were amplified from zebrafish genomic 
DNA using the following primers: hoxa1 1a intron: fw 5/-GAATTCAACAGTAAG 
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TACGAGCTCAAC-3’; rev 5‘-GGTACCACCTAAATGTAAATACACGT-3’; 
hoxa11b intron: fw 5‘-GAATTCCAGCGGCAGCAGCAGTACGT-3’; rev 
5'-GGTACCCCGTGTCTTTTGTCCATCTAA-3’, 

The zebrafish hoxal1a and hoxa11b and the mouse Hoxa11 introns were sub- 
cloned into the pEGFP-N1 vector (CLONTECH Laboratories, Inc.) in which the 
CMV promoter upstream of eGFP was replaced with the human HBB minimal pro- 
moter using the following primers: fw 5’‘-GGATCCCTGGGCATAAAAGTCAG-3/, 
rev 5'‘-ACCGGTTCTGCTTCTGGAAGGCT-3’. This vector also contains the Tol2 
arms to increase transgenesis efficiency. For screening purposes, a heart marker 
(cmlc2:mCherry*’) was added to zebrafish Tg(z-Intal la-eGFP) and Tg(z-Intal 1b- 
eGFP) constructs. All constructs were microinjected in one-cell stage wild-type 
zebrafish embryos at a concentration of 100 ng jl! together with 50 ng jl trans- 
posase mRNA. 

Generation of zebrafish transgenic lines. Primary injected zebrafish (P)) 
are raised until 3 months of age, and then are screened for transgenic prog- 
eny (F;). P; fish are crossed with wild-type fish and the embryos are screened 
at 2 days post-fertilization (dpf). Owing to lack of fin fold eGFP expres- 
sion in the Tg(z-Intalla-eGFP; cmlc2:mCherry), Tg(z-Intal1b-eGFP; 
cmlc2:mCherry) injected fish, embryos were screened for the presence of the 
cmlc2:mCherry heart marker and genotyped to confirm the presence of the 
hoxa1la/b intron:eGFP elements. The following primers were used for geno- 
typing: hoxalla: fw 5‘-GGTACCACCTAAATGTAAATACACGT-3’, rev 
(eGFP) 5’-GTCCTCCTTGAAGTCGATGC-3’; hoxal 1b: fw 5‘-GGTACCCC 
GTGTCTTTTGTCCATCTAA-3’, rev (eGFP) 5’-GTCCTCCTTGAAGTC 
GATGC-3’. 

Three transgenic lines for Tg(m-Intal1-eGFP) were obtained to confirm 
the expression pattern. A fourth line containing the cmlc2:mCherry heart 
marker was also created. To confirm the Hbb minimal promoter does not 
drive tissue-specific expression alone, a transgenic line Tg(HBB:eGFP; 
cmlc2:mCherry) was also created and genotyped using the following 
primers: Hbb: fw 5‘-GGATCCCTGGGCATAAAAGTCAG-3’, rev (eGFP) 
5/-GTCCTCCTTGAAGTCGATGC-3’. 

Zebrafish in situ hybridization. In situ hybridization on whole-mount 
embryos was performed as previously described**. Digoxigenin-labelled 
antisense RNA probes were generated using the following cDNAs: hoxal3a 
(500 bp; Addgene 36463), hoxa1l3b (700 bp; Addgene 36568), hoxal1b 
(probe 1 (Extended Data Fig. 8c, d); 800 bp; Addgene 36466). For hoxal1a/b 
antisense/sense RNA probes (Extended Data Fig. 9a, b), hoxa11a (713 bp; Gene 
ID 58061, NCBI) and hoxa11b (747 bp; gene ID 30382, NCBI) partial cDNAs 
(exon 1) were obtained by PCR with reverse transcription from total RNA of 
24-48 hpf embryos using the following primers: hoxalla exon 1: fw 5'-AT 
GATGGATTTTGACGAAAGGGTT-3’, rev 5/-TGTTCCCACCGCTAGTTTTT 
TCCT-3’; hoxal1b exon 1: fw 5’-ATGATGGATTTTGATGAGCGGGTA-3’, 
rev 5'/-TGCTGCTGCCGCTGAATTTATCTT-3’. 

For accurate comparison, hoxalla and hoxa11b sense and antisense probes, 

respectively, are identical in length and were transcribed using the same RNA 
polymerase. In situ hybridizations were also performed in parallel with identical 
staining times. 
Transfection and gene expression analysis. 293T cells (ATCC) were transfected 
using lipofactamine. Cells (800,000) were plated in 6-well plates. Cells were 
checked for mycoplasma contamination using Venor GEM Mycoplasma Detection 
Kit (MP0025 SIGMA). A total of 21g of DNA (250 ng reporter plasmid, 250 ng 
effector plasmid or empty expression vector), 25ng of mCherry expression vector 
as internal control and 1.45,1g carrier pBSK plasmid was used for each transfection. 
All transfections were performed in duplicates. Then, 24h after transfection, the 
medium was changed and 48h after transfection, cells were processed for RNA 
extraction. Reporter gene expression was normalized to internal control mCherry 
(n=3). Gene expression (Hoxa11) was measured in dissected E11.5 forelimb buds 
of the Rosa'?°**!! knock-in embryos that were stored in RNA later before RNA 
extraction (n=4). 

RNA extraction was done using RNAeasy Plus mini kit (Qiagen 74134). 
cDNA synthesis was performed using M-MuLV reverse transcriptase (NEB) 
and a mix of random primers and oligo-dT on lug of total RNA. Quantitative 
real-time-PCR was performed with cDNA and the SYBR Green kit (applied 
biosystems) using the following primers: fw 5’-AGGAGAAGGAGCGACGG-3/ 
and rev 5‘-GGTATTTGGTATAAGGGCAGCG-3’ (Hoxa11); fw 5'-CTTT 
GTCAAGCTCATTTCCTGG-3’ and rev 5’-TCTTGCTCAGTGTCCTTGC-3’ 
(Gapdh); fw 5’-TTGACCTAAAGACCATTGCACTTC-3’ and rev 5/-TTCTCA 
TGATGACTGCAGCAAA-3’ (Thp); fw 5’/-GCCTACAACGTCAACATCAAG-3’ 
and rev 5’-GCGTTCGTACTGTTCCAC-3’ (mCherry); fw 5‘-GACCCTGA 
AGTTCATCTGCA-3’ and rev 5’-CCGTCGTCCTTGAAGAAGA-3’ (gfp). 
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Study approval. All mice experiments described in this article were approved by 
the Animal Care Commitee of the Institut de Recherches Cliniques de Montréal 
(protocols 2011-39 and 2014-14) and zebrafish experiments were approved by 
uOttawa Animal Care Committee (protocol BL-2317-R1). 
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Hoxa11 _— Hoxa11as-b Hoxa11 


gfp-as 
gfp antisense transcripts in Hoxa11°C""”C"? limb buds at E12.5 (b). 
c-e, Hoxa11 expression in wild-type limb buds (c), and Hoxa11las-b 
(d) and Hoxa11 (e) expression in Prx1-Hoxa11as limb buds. Original 


Extended Data Figure 1 | Absence of antisense transcription 3’ to 
the Hoxal11 promoter in the Hoxa1 1°¢??°¢? limb and evidence that 
Hoxa11as-b transcripts produced in trans have no effect on Hoxa11 
expression. a, b, Detection of Hoxa11as-b transcripts in wild-type magnification, x31.5. 
limb buds at E12.5 (a), and whole-mount in situ hybridization to detect 
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Extended Data Figure 2 | Deletion of the distal enhancer in Hoxa11 was loaded with wild-type DNA. c, PCR reaction using a forward primer 
intron using CRISPR-Cas9. a, Scheme of the wild-type and targeted located upstream of ssRNA_1 and a reverse primer located downstream 
(Hoxa11“"") loci. Sites targeted by the single-guide RNAs (sgRNA_1 sgRNA_2 shows the presence of a 300 bp (AInt 300 bp) fragment expected 
and sgRNA_ 2) for the CRISPR-Cas9-mediated deletion of the distal for the Hoxa114""' allele. d, The sequence of the 300-bp PCR fragment 
enhancer. The blue rectangles indicate the position of the DNA probe used _ confirms the CRISPR-Cas9-mediated deletion of the Hoxa11 intronic 

to confirm the deletion by Southern blot in b. b, Lane 1 shows the 6-kb region containing the distal enhancer (only the sequence encompassing 
KpnI band resulting from the CRISPR-Cas9-mediated deletion. Lane 2 the deletion breakpoints is shown). 
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Extended Data Figure 3 | The distal enhancer located in the Hoxal1 
intron is bound by HOXA13 and HOXD13 in distal limb cells and its 
activity is increased by HOXA13 in 293T cells. a, Integrative genomics 
viewer (IGV) screenshot showing HOXA13 and HOXD13 ChIP-seq data 
at the Hoxa11 locus. These ChIP-seq data were obtained using chromatin 
from distal forelimb buds of wild-type E11.5 mouse embryos (R. Sheth 

et al., manuscript submitted). b, Transfection assay shows HOXA13 
dependent activation of Hoxa11 intron driving reporter gene expression. 
Two-tailed Tukey’s multiple comparisons test was performed. Error bars 
indicate s.d (n= 3). RQ, relative quantification. 
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Extended Data Figure 4 | Individual inactivation of Hoxa13 or Hoxd13 
is not sufficient to fully abrogate antisense transcription in distal limbs. 
a, b, Whole-mount in situ hybridization, using probe A (see Fig. 1) 

to detect all antisense transcripts, on Hoxd13~/~ (a) and Hoxal3~/— 

(b) mouse limb buds at E11.5. Antisense transcription in distal limbs 
remains robust in both mutants but a clear reduction is seen in the distal 
Hoxa13~/~ limbs. Original magnification, x 31.5. 


Hoxa11tas 
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Extended Data Figure 5 | Inactivation of both Hoxa13 and Hoxd13 
disrupts antisense transcription overlapping with the Hoxa11 exon 1. 
a-d, Hoxa11as-b expression (probe B in Fig. 1) in limb buds (a, b) and 
tail buds (c, d) from wild-type (a, c) and Hoxal3~/~ Hoxd13~/~ (b, d) 
E12.5 mouse embryos. Whole-mount in situ hybridization shows that 
Hoxa11as-b expression in tail buds (internal control) is similar in both the 
wild-type (c) and double-mutant (d) embryos, whereas there is almost no 
expression remaining in Hoxal3~/~ Hoxd13~/~ limb buds (b). Original 
magnification, x31.5. 
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Extended Data Figure 6 | Generation of the Rosa”**!! knock-in mouse 
line. a, Targeting of the endogenous Rosa26 locus (top three lines). 

The wild-type Rosa26 locus is shown below (middle). Regions used as 
homologous arms for the recombination in ES cells are indicated by brown 
rectangles labelled 5’ and 3’, respectively. Scheme of the targeted locus 
after homologous recombination in ES cells and after Cre-mediated 
recombination is shown at the bottom. The position of the internal (IP) 


d 
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WT 11,5 kb se 
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and external (EP) probes and restriction sites used for Southern blot 
analysis are indicated on both the wild-type and targeted locus. 

b, c, Southern blots of ES cells clones using the internal probe (b) and 
external probe (c) to detect the targeted allele (lane 1). d, Southern blot of 
wild-type (lane 2) and heterozygous (lane 1) mice. A, Ascl; E, EcoRV; P, 
Pacl; S, Swal. 
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Extended Data Figure 7 | The conditional gain of Hoxa11 using the 
Hoxa13Cre allele results in the formation of supernumerary digits. 

a, b, Autopod of Rosa®**!/* (a) and Rosa#**"! Hoxa13Cre (b) at E15.5. 
Anterior is up. The Rosa26 locus and Hoxa13Cre allele being on the same 
chromosome (Chr6), the gain-of-function phenotype was assessed with 
only one copy of the Rosa!" allele. c-e, Autopod skeletons of Prx1Cre; 
Rosalexat/Hoxall mice at PO from four distinct mutants (anterior is up). 
The number of digits varies from 6 to 7, with often a small post-axial 
extra-digit (posterior). The extra-digit phenotype is fully penetrant upon 
Cre-activation of two copies of the Rosa"*" allele (n = 10). Original 
magnification, x 20.d, Quantification of Hoxa11 expression level by 
quantitative reverse transcriptase PCR (RT-qPCR) on RNA extracted 
from E11.5 forelimb, relative to both Gapdh and Tbp mRNA of Prx1Cre; 
Rosal¥oxa11/Hoxall embryos, Two-tailed t-test was performed. Error bars 
indicate s.d (n=4). 
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Extended Data Figure 8 | hoxa11 and hoxa13 are expressed in 
overlapping domains in zebrafish fins. a-d, Expression of hoxa13b (a, b) 
and hoxa11b (c, d) in zebrafish fins at 60 hpf (a, c) and 72 hpf (b, d). Dotted 
lines indicate the boundary between the endochondral disc and the fin 
fold. Original magnification, x 400. 
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Extended Data Figure 9 | Absence of antisense transcription at the 
hoxa11a and hoxa1 1b loci in zebrafish fins. a, b, Whole-mount in situ 
hybridization with probes designed to detect putative antisense 
transcription at hoxal1a (a) and hoxa11b (b). c-f, No antisense 
transcription is detected, whereas expression of hoxa11a (c), hoxa11b (d), 
hoxa13a (e) and hoxa13b (f) is observed in zebrafish fins at the same stage. 
Asterisks correspond to the staining from the fin on the other side of the 
embryo. Original magnification, x 63. 
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Extended Data Table 1 | Summary of transient transgenic embryos analysed 


Zebrafish Transient Transgenics 


Construct 

Tg(HBB:eGFP) 

Tg(z-inta tf fa-eGFP) 
Tg(z-Intat1b-eGFP) 

Tg(m-inia t 1-eGFP) 

Tg(HBB:eGFP; cmic2:mCherry) 
To(z-Intatfa-eGFP; cmic2:mC herry) 
Tg(z-Inta1tb-eGFP, cmic2:mCherry) 
Tg(i- Inia 1 1-eGFP; cmic2:mCherry) 


Mouse Transient Transgenics 


Construct 


Tg(z-Iinta tf ta-eGFP) 


Tg(z-intat1b-eGFP) 


% of eGFP positive 
fish 

0% (n=74) 

0% (n=105) 

1.19% (n=84) 
91.9% (n=123) 
1.25% (n=94) 

0% (n=200) 

0% (n=300) 


88.9% (n=53) 


% of eGFP positive 

embryos (# eGFP 

positive / # genotyped 
ositive) 

0% (n=0/10) 


0% (n=0/7) 


Zebrafish stable lines for Tg(z-Intal 1a-eGFP; cmlc2:mCherry); Tg(z-Intal 1b-eGFP; cmic2:mCherry) were also gen- 


erated and three genotyped F, embryos per line were analysed and confirmed for the absence of gfp expression. 


For Tg(m-Intal 1-eGFP; cmlc2:mCherry), four distinct transgenic lines were also generated and analysed. 


© 2016 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


LETTER 


doi:10.1038/nature19824 


Olfactory receptor pseudo- pseudogenes 


Lucia L. Prieto-Godino!, Raphael Rytz!+, Benoite Bargeton!, Liliane Abuin!', J. Roman Arguello!, Matteo Dal Peraro** & 


Richard Benton! 


Pseudogenes are generally considered to be non-functional DNA 
sequences that arise through nonsense or frame-shift mutations 
of protein-coding genes’. Although certain pseudogene-derived 
RNAs have regulatory roles, and some pseudogene fragments are 
translated’, no clear functions for pseudogene-derived proteins are 
known. Olfactory receptor families contain many pseudogenes, 
which reflect low selection pressures on loci no longer relevant to 
the fitness of a species’. Here we report the characterization of a 
pseudogene in the chemosensory variant ionotropic glutamate 
receptor repertoire™® of Drosophila sechellia, an insect endemic 
to the Seychelles that feeds almost exclusively on the ripe fruit of 
Morinda citrifolia’. This locus, D. sechellia Ir75a, bears a premature 
termination codon (PTC) that appears to be fixed in the population. 


Wild type 


D. melanogaster c Ir75a/— 


120 
100 
80 
60 
404 | 
204 
a [) 


a Antenna 


140) D. simulans 


80 | 
aE 
a a 


D. sechellia 


Corrected response (spikes s~') 
& 


Ir75a Ir75d  Ir4ta 


’ 


Acetic acid 
Propionic acid : 
’ 62 6 3 3d 
1,4-diaminobutane of p"ae Foe 
Pyridine RC KS) 
Ne 9 
iv 
at 
Control 20MVl O55 e Control 1 
Acetic acid 160 Control 2 
Ir75a-/- 


Ir75a~ + 


1,4-diaminobutane 


Ir75a~ 
Acetic acid 


1,4-diaminobutane 


Ir75a re: 
Acetic acid 


1,4-diaminobutane 


Corrected response (spikes s~') 


Ir75a 


Hite 


D. melanogaster TICTTAGTGGCAACGGTTGTCACCCAACGTCCACTCACCTTGTCGGATGACGA 
195 North America CCTTTAGTGGCAACGGTGGTCACCCAACGTCCACTCACCTTGTCGGATGACGA 
196 Madagascar CCTTTAGTGGCAACGGTGGTCACCCAACGTCCACTCACCTTGTCGGATGACGA 

197 Madagascar CCTTTAGTGGCAACGGTGGTCACCCAACGTCCACTCACCTTGTCGGATGACGA, 
Seychelles CCTTTAGTGGCAACGGTGGTCACCCAACGTCCACTCACCTTGTCGGATGACGA 

25 Cousin CCTTTAGTGGCAACGGTGGTCACCTAACGTCCACTCACCTTGTCGGATGACGA 

08 Praslin CCTTTAGTGGCAACGGTGGTCACCTAACGTCCACTCACCTTGTCGGATGACGA 

11 Cousin CCTTTAGTGGCAACGGTGGTCACCTAACGTCCACTCACCTTGTCGGATGACGA 

13 Cousin CCTTTAGTGGCAACGGTGGTCACCTAACGTCCACTCACCTTGTCGGATGACGA 

15 Cousin CCTTTAGTGGCAACGGTGGTCACCTAACGTCCACTCACCTTGTCGGATGACGA 

19 Cousin CCTTTAGTGGCAACGGTGGTCACCTAACGTCCACTCACCTTGTCGGATGACGA 

27 Unknown CCTTTAGTGGCAACGGTGGTCACCTAACGTCCACTCACCTTGTCGGATGACGA 

30 Cousin CCTTTAGTGGCAACGGTGGTCACCTAACGTCCACTCACCTTGTCGGATGACGA 

D. sechellia 25 cDNA ....... TGGCAACGGTGGTCACCTAACGTCCACTCACCTTGTCGGATGACGA 


D. simulans 


D. sechellia 


However, D. sechellia Ir75a encodes a functional receptor, owing 
to efficient translational read-through of the PTC. Read-through 
is detected only in neurons and is independent of the type of 
termination codon, but depends on the sequence downstream of 
the PTC. Furthermore, although the intact Drosophila melanogaster 
Ir75a orthologue detects acetic acid—a chemical cue important for 
locating fermenting food®? found only at trace levels in Morinda 
fruit!°—D. sechellia Ir75a has evolved distinct odour-tuning 
properties through amino-acid changes in its ligand-binding 
domain. We identify functional PTC-containing loci within 
different olfactory receptor repertoires and species, suggesting 
that such ‘pseudo-pseudogenes’ could represent a widespread 
phenomenon. 


Figure 1 | Ir75a encodes an acetic acid receptor in D. melanogaster 

and is a transcribed pseudogene in D. sechellia. a, Top, schematic of the 
third antennal segment covered with porous olfactory sensilla of various 
morphological classes. Bottom, schematic of the ac2 sensillum class, 
which houses three OSNs that each express different ionotropic receptor 
genes. b, Electrophysiological responses in ac2 sensilla to the indicated 
odours (mean + s.e.m.; mixed genders) in D. melanogaster*® (n=9), 

D. simulans*! (n=9) and D. sechellia*! (n= 8). The colours of the columns 
on the histogram distinguishes two broad chemical classes of odours: 
acids (magenta) and amines (black). c, Immunostaining with anti-Ir75a 
(magenta) and anti-Ir8a (green) antibodies on antennae of wild-type 

(left) or Ir75a-mutant (Ir75aM3; right) animals. Insets show the co- 
localization of Ir75a and Ir8a in the OSN soma and dendritic compartment 
(arrowheads). Scale bars, 101m. d, Representative traces of extracellular 
recordings of neuronal responses to the indicated stimuli in ac2 sensilla in 
control (Ir75a-GAL4, Ir75a!353/), Ir75a hemizygous mutant (Ir75a~"-; 
Tr75aM300253/D f(3L)BSC415) and Ir75a rescue (UAS-D. melanogaster 
(Dm)Ir75a;Ir75a-GAL4, Ir75aM3053/Df(3L) BSC415) animals. Bars above 
the traces mark 1 s stimulus time. e, Quantification of solvent-corrected 
responses in d. Data are mean + s.e.m.; mixed genders. Control 1: Df(3L) 
BSC415/+ (n= 12); control 2: Ir75a-GAL4, Ir75aM 2053/4. (n = 11); 
Ir75a~'~ (n= 12); Ir75a rescue (n= 13). Statistical differences between 
genotypes were tested using pairwise Wilcoxon rank-sum tests among the 
responses to each odorant, and p values adjusted for multiple comparisons 
using the Benjamini-Hochberg method. Significant comparisons 

to Ir75a~'~ are shown in the figure (***P=0.0001, **P =0.001; for 

full information about P values see Source Data & Methods). f, Top, 

gene model of Ir75a indicating the position of the C640T nucleotide 
substitution in the D. sechellia orthologue. Bottom, genomic sequence 
spanning this nucleotide position in D. melanogaster and several 
geographically distributed D. simulans and D. sechellia strains (Methods). 
The bottom italicized sequence is of the D. sechellia cDNA. The D. sechellia 
C640T substitution (red) creates a PTC (underlined). 
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Comparative electrophysiological analysis of olfactory sensory 
neuron (OSN) responses in closely-related drosophilid species revealed 
a loss of sensitivity to acetic acid in D. sechellia neurons housed in the 
antennal coeloconic 2 (ac2) sensillum class of sensory hairs (Fig. 1a, b). 
In D. melanogaster, acetic acid is detected by ac2 OSNs expressing 
Ir75a!!; these sensilla house two other neurons that are sensitive 
to amines and express Ir4la and Ir75d!!»!? (Fig. 1a). Two lines of 
evidence support Ir75a as the acetic acid receptor. First, the protein is 
expressed exclusively in these acetic-acid-sensing ac2 neurons, where it 
co-localizes with the ionotropic receptor co-receptor Ir8a!? in somata 
and sensory dendrites (Fig. 1c). Second, protein-null Ir75a-mutant 
animals (Fig. 1c) lack responses to acetic acid (and other organic 
acids) in these sensilla, while amine ligand-evoked action potentials 
are unaffected (Fig. 1d, e). Acid sensitivity is restored by expression of 
an Ir75a cDNA in these neurons (Fig. 1d, e). 

Ir75a orthologues are present across drosophilids’, but D. sechellia 
Ir75a is a predicted pseudogene. A C640T nucleotide substitution in 
the open reading frame (ORF) creates a PTC (CAA—TAA) in exon 4 
(Fig. 1f) that is predicted to truncate the protein within the ligand- 
binding domain (LBD). This PTC is present in all D. sechellia 
strains that we sequenced (from at least two islands of the Seychelles 
archipelago), but not in any D. melanogaster or D. simulans strain 
(Fig. 1f), suggesting that it is a derived change that is fixed in the 
D. sechellia population. We could, however, amplify Ir75a cDNA from 
D. sechellia antennal RNA. Sequencing of this cDNA, in addition to 
data from D. sechellia antennal RNA sequencing (Methods), verified 
that the PTC is not edited or spliced out of the transcript to maintain 
an intact ORF (Fig. 1f). 

The pseudogenization of D. sechellia Ir75a provided a logical expla- 
nation for the loss of responses to acetic acid in this species (Fig. 1b). 
Nevertheless, D. sechellia ac2 sensilla house a neuron that responds 
to other acidic odours (Fig. 1b), suggesting that another receptor 
is expressed in these OSNs. We were, however, unable to detect 
other acid-sensing ionotropic receptors in these cells. We therefore 
wondered whether the Ir75a pseudogene might encode a functional 
receptor. Indeed, the anti-Ir75a antibody stains OSNs in D. sechellia 
with a similar distribution to that of ac2 sensilla (Fig. 2a). Because 
the epitope of this antibody is encoded upstream of the PTC, we 
generated a second antibody that recognizes an epitope of the protein 
that is encoded downstream of the PTC (anti-Ir75a), and found 
that it labelled the same cells (Fig. 2a). We also generated a transgene 
comprising D. sechellia Ir75a cDNA in which the terminal stop codon 
was removed and the coding sequence for GFP inserted in-frame with 
the last coding codon (DsIr75a:GFP). As D. sechellia is not yet amenable 
to transgenesis, we expressed this construct in D. melanogaster Ir75a 
neurons. GFP fluorescence was detected from D., sechellia I1r75a:GFP 
(Fig. 2b and Extended Data Fig. 1), indicating that the PTC is read 
through, permitting translation of the downstream GFP sequence. No 
GFP signal was observed with a control construct that retained the 
terminal stop codon (DsIr75aSTOP:GFP) (Fig. 2b and Extended Data 
Fig. 1). 

We next asked whether D. sechellia Ir75a encodes a functional 
receptor by misexpressing it in heterologous ‘ionotropic receptor 
decoder’ neurons (that is, ac4 Ir84a-mutant neurons that lack 
the endogenous ligand-specific Ir84a, but which still express the 
co-receptor Ir8a). In these cells, D. melanogaster Ir75a endowed 
sensitivity to acetic and propionic acids (Fig. 2c, d), consistent with 
the expected endogenous responses of ac2 Ir75a OSNs (Fig. 1b). By 
contrast, D. sechellia Ir75a conferred responses to propionic, butyric 
and 2-oxopentanoic acids (Fig. 2c, d). Cluster analysis revealed that 
the responses of D. sechellia Ir75a in ionotropic receptor decoder 
neurons and the endogenous responses of ac2 sensilla neurons 
group together (Fig. 2e, f). These results provide evidence that 
the D. sechellia Ir75a pseudogene encodes a functional olfactory 
receptor that accounts for the ac2 acid-sensing properties of this 
species. 
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Figure 2 | Translational read-through of the PTC in D. sechellia 

Ir75a permits production of a functional olfactory receptor. 

a, Immunofluorescence on D. sechellia antennae with anti-Ir75a (green, 
recognizing an epitope upstream of the PTC) and anti-Ir8a (magenta) 
(left) or anti-Ir75a (green) and anti-Ir75aP (magenta, recognizing an 
epitope downstream of the PTC) (right). The insets show the separate 
channels for anti-Ir75a and anti-Ir75a? corresponding to the area 
demarcated with a dashed box. Scale bars, 101m. b, Immunofluorescence 
with anti-GFP antibodies (green) and anti-Ir8a antibodies (magenta) on 

a D. melanogaster antenna in which Ir75a neurons express transgenes 
encoding D. sechellia (Ds)Ir75a:GFP (UAS-DsIr75a:GFP/+;Ir75a-GAL4/+) 
(left) or Ir75aSTOP:GFP (UAS-DsIr75aSTOP:GFP/-+;Ir75a-GAL4/+) 
(right). c, Representative traces of extracellular recordings of neuronal 
responses to the indicated stimuli in ionotropic receptor (Ir) decoder 
neurons expressing D. melanogaster Ir75a (UAS-DmIr75a;Ir84a°""4) 

or D. sechellia Ir75a (UAS-DsIr75a;Ir84a°*"). d, Quantification of 
odour-evoked responses in empty ionotropic receptor decoder neurons 
(Ir84a%4"4, n =5), or the decoder neurons expressing Dmlr75a (n= 7-11) 
or DsIr75a (n= 7-14) (genotypes as in c) (mean + s.e.m.; mixed genders). 
e, k-means cluster analysis of the responses of D. melanogaster and 

D. sechellia 1r75a in the ionotropic receptor decoder and D. sechellia ac2 
sensilla to the four main agonists used (acetic acid, propionic acid, butyric 
acid, and 2-oxopentanoic acid). Mean and s.d. of all the solutions within 
each k value (n = 100). The peak silhouette value at k = 2 was significantly 
different from other k values (Student’s t-test between k= 2 and k=3, 

*P < 0.001), indicating that responses of these three distinct neuron classes 
statistically fall within two clusters. f, Plot of the three first principal 
components (PC) from a principal component analysis of the same odour 
response profiles as in e. 


Reversion of the PTC to the ancestral glutamine-encoding codon 
(TAA—CAA; *214Q) in transgenic constructs had no effect on 
expression or function of D. sechellia Ir75a (Fig. 3a, b and Extended 
Data Fig. 1), indicating that the PTC is read through efficiently and 
does not influence odour responses. Translational read-through of 
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terminal stop codons, resulting in C-terminal extensions, has 
been characterized for several eukaryotic genes!®'®; in these 
cases, the ‘leakiness’ of translation arrest is predicted to depend on 
the termination codon (TGA>TAA>TAG) and the immediate 
3’ nucleotide (C>T>G>A)'*". We investigated the cis-regulatory 
elements that determine the high efficiency of read-through of the Ir75a 
PTC—which has the second most leaky termination codon context 
(TAAC)'4—by generating additional read-through GFP reporters 
bearing mutations in this sequence (Fig. 3c, e). Replacement of the TAA 
PTC with either TGA or TAG did not affect GFP expression (Fig. 3c, d 
and Extended Data Fig. 1). By contrast, replacing the immediate 3’ 
cytosine nucleotide with an adenosine almost completely blocked GFP 
expression (Fig. 3e and Extended Data Fig. 1), although this transgene 
still produces a truncated protein, as detected by the anti-Ir75a anti- 
body (Fig. 3e). These results indicate that sequence context outside, 
but not within, the Ir75a PTC is critical for determining read-through 
efficiency. 

The expression of full-length protein by PTC-containing 
D. sechellia Ir75a transgenes in two populations of D. melanogaster 
OSNs (that is, Ir75a and ionotropic receptor decoder neurons) indicates 
that the mechanisms that permit read-through are not species- 
or OSN-class-specific. We investigated whether read-through 
occurs in other cell types by using an actin5C-GAL4 driver to 
broadly express D. sechellia Ir75a:GFP and, as a control, D. sechellia 
Ir75a“*!42:GFP. D. sechellia Ir75a°*!42:GFP was detected in many, 
but not all, cells (Extended Data Fig. 2); this heterogeneity may arise 
from the variable expression of actinSC-GAL4 in different cell types. 
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Figure 3 | Efficiency and tissue-specificity of translational read- 
through of the D. sechellia Ir75a PTC. a, Immunofluorescence with 
anti-GFP on D. melanogaster antennae in which ionotropic receptor 
decoder neurons express transgenes encoding D. sechellia Ir75a:GFP 
(UAS-DsIr75a:GFP/+;Ir84a“"4/+) or Ir75a’?!42:GEP (UAS- 
DsIr75a‘7!4°:GEP/+;1r84a“"4/+). Scale bar, 10 uum. b, Quantification 

of odour-evoked responses in empty ionotropic receptor decoder 
neurons (Ir84a@4"4, n = 4-5), or those expressing D. sechellia Ir75a (UAS- 
DsIr75a;Ir84a"™, n= 8-14) or Ir75a“?!48 (UAS-DsIr75a°?!4Q; Ir84a44, 
n= 8-11). Data are mean + s.e.m.; mixed genders. No significant 
differences were found between the two genotypes in the responses to 
any of the odours by Student’s t-test. c-e, Immunofluorescence with anti- 
GFP (green) on D. melanogaster antennae in which ionotropic receptor 
decoder neurons express D. sechellia Ir75a:GFP transgenes bearing the 
indicated mutations to the PTC or 3’ nucleotide (genotypes of the form: 
UAS-DsIr75a**:GFP/+;Ir84a%4"4/+). In e, immunofluorescence with 
anti-Ir75a antibodies (magenta) is also shown. Signal is detected in Ir84a 
neurons with anti-Ir75a (which detects an epitope upstream of the PTC) 
but not anti-GFP (arrowheads), meaning that this transgene encodes 
protein up to, but not beyond, the PTC. The dashed white square indicates 
endogenous Ir75a neurons. Scale bar, 101m. f, Immunofluorescence with 
anti-GFP (green) on D. melanogaster antennae in which elav-GAL4 drives 
neuronal expression of D. sechellia Ir75a’'7!*2:GEFP (elav-GAL4/+; UAS- 
DsIr75a’7!42:GFP/+, 8 out of 8 brains were GFP-positive) or Ir75a:GFP 
(elav-GAL4/+; UAS-DsIr75a:GFP/+, 5 out of 5 GFP-positive). Scale 

bars, 10,1m. g, Immunofluorescence with anti-GFP (green) on 

D. melanogaster antennae in which repo-GAL4 drives glial-specific 
expression of D. sechellia Ir75a‘*!42:GEP (UAS-DsIr75a'7!42:GFP/+;repo- 
GAL4/+, 10 out of 10 were GFP-positive) or Ir75a:GFP (UAS- 
DsIr75a:GFP/+;repo-GAL4/+, 0 out of 10 were GFP-positive). Scale 

bars, 10j1m. h, Immunofluorescence with anti-GFP (green) and synaptic 
neuropil marker nc82 (magenta) antibodies on D. melanogaster brains in 
which elav-GAL4 drives neuronal expression of D. sechellia Ir75a'?!42:GFP 
(6 out of 6 were GFP-positive) or Ir75a:GFP (8 out of 8 were GFP-positive); 
genotypes as in f. Scale bars, 501m. i, Immunofluorescence with anti-GFP 
(green) and anti-nc82 (magenta) on D. melanogaster brains in which repo- 
GAL4 drives glial-specific expression of D. sechellia Ir75a‘?!42:GFP (6 out of 
6 were GFP-positive) or Ir75a:GFP (0 out of 8 were GFP-positive); genotypes 
as in g. Scale bars, 501m. 


Nevertheless, the GFP-positive cells encompassed neurons and 
non-neuronal support cells (Extended Data Fig. 2, arrowheads). By 
contrast, D. sechellia Ir75a:GFP was detected exclusively in neurons 
(Extended Data Fig. 2). To confirm this finding, we compared the 
expression of D. sechellia Ir75a°!42:GFP and Ir75a:GFP trans- 
genes induced either by a pan-neuronal (elav-GAL4) or pan-glial 
(repo-GAL4) driver. Both transgenes produced similar GFP signals 
in sensory neurons throughout the antenna (Fig. 3f). However, only 
Ir75a'742:GFP was detectable in glia (Fig. 3g). Similarly, we detected 
broad neuronal expression of both Ir75a°?!42:GFP and Ir75a:GFP 
in the brains of these animals (Fig. 3h), but only Ir75a°?!42:GFP 
was expressed in glia(Fig. 3i). Thus, efficient read-through of the 
D. sechellia Ir75a PTC occurs in diverse neuronal classes, but not in 
non-neuronal cells. 

We next investigated the molecular basis of the different ligand- 
response profiles of D. sechellia and D. melanogaster Ir75a. As 
Ir75a-dependent neuron responses are conserved between 
D. melanogaster and D. simulans (Fig. 1b), we reasoned that ligand- 
specificity determinants would be conserved in Ir75a sequences from 
these species but differ in D. sechellia Ir75a; of eight such positions 
(excluding the PTC), six lie within the predicted bi-lobed LBD 
(Extended Data Fig. 3). Three of these positions are located within the 
putative ligand-binding pocket in a protein homology model of the 
D. sechellia Ir75a LBD (Fig. 4a). We tested their function by generating 
a version of D. melanogaster Ir75a in which each of these positions 
was mutated to encode the amino acid present in D. sechellia Ir75a, 
and expressed this mutant (D. melanogaster 177517898, 36K, F538L) 
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Figure 4 | Molecular basis of the functional divergence of D. sechellia 
Ir75a. a, Protein homology model of the LBD of D. sechellia Ir75a 
(Methods). The side chains of the residues that are different in D. sechellia 
Ir75a compared to D. simulans and D. melanogaster Ir75a are represented 
in pink, and the subset of these mutated in this study are shown in red. 
The location of the residue putatively encoded by the PTC is shown in 
yellow. b, Quantification of odour-evoked responses in empty ionotropic 
receptor decoder neurons (Ir84a4"4, n = 5-6) or the decoder neurons 
expressing D. melanogaster Ir75a (n =9), D. sechellia Ir75a (n = 8-9) or 
the D. melanogaster Ir75a 12852596 F538L mutant (n = 14) (genotypes used, 
UAS-DxIr75a*“;Ir84a°"4) (mean + s.e.m, mixed genders); experiments 
for all transgenes were performed in parallel. Responses to each odour of 
D. sechellia Ir75a and D. melanogaster Ir75a178S SK 15381 are statistically 
indistinguishable (Wilcoxon rank-sum test). 


in ionotropic receptor decoder neurons. This engineered receptor 
conferred responses indistinguishable from those of D. sechellia Ir75a 
(Fig. 4b), indicating the importance of one or more of these residues 
as odour-specificity determinants. 

Finally, we sought to identify other functional olfactory receptor 
pseudogenes. Within wild-caught isolates of D. melanogaster 
(Methods), we identified several strains in which the Ir75b ORF 
contains a C517T substitution, creating a PTC (Fig. 5a and Methods). 
Immunostaining of antennae from flies of these lines with an Ir75b 
antibody (recognizing an epitope encoded downstream of the PTC) 
showed a pattern comparable to controls (Fig. 5a, b), indicating that 
this PTC is read through. Consistently, electrophysiological recordings 
of Ir75b neurons in one of these lines (Raleigh707, RAL707) revealed 
robust responses when presented with known agonists" (Fig. 5c). 
In another strain (RAL441), the Ir3la ORF contains a T1805G 
substitution that is predicted to truncate the receptor before the third 
transmembrane domain (Fig. 5d). Nevertheless, electrophysiological 
recordings of RAL441 Ir31a neurons stimulated with 2-oxopentanoic 
acid'!, revealed clear responses (Fig. 5e). We also identified a segre- 
gating PTC-containing allele of the odorant receptor gene Or35a (in 
Tasmanian strains T09 and T29) (Fig. 5f). In strain T09, responses 
of Or35a neurons to octanol’? were readily detected (Fig. 5g). These 
findings indicate that the phenomenon of functional olfactory pseu- 
dogenes is restricted neither to a particular species nor to a specific 
receptor repertoire. 

Our efforts to understand the molecular basis of the loss of olfactory 
sensitivity to acetic acid in D. sechellia led us to discover a notable and, 
to our knowledge, unprecedented evolutionary trajectory of a presumed 
pseudogene. Efficient read-through of a PTC in D. sechellia Ir75a 
permits production ofa full-length receptor protein, in which reduction 
in acetic acid sensitivity and gain of responses to other acids is due 
to lineage-specific amino acid substitutions in the LBD pocket. The 
PTC does not noticeably influence the activity of D. sechellia Ir75a, 
suggesting that it is selectively neutral from an evolutionary stand- 
point. We propose that it became fixed through genetic drift, given 
D. sechellia’s persistent low effective population size’. 

It is not yet clear how the D. sechellia Ir75a PTC is read through. 
It cannot be because of insertion of the alternative amino acid 
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Figure 5 | Functional pseudogene alleles of other receptors in other 
species. a, Gene structure of D. melanogaster Ir75b indicating the position 
of the PTC in the RAL707 strain (an identical sequence is found in all 
other strains containing a PTC at this position; see Methods). The region 
encoding the epitope recognized by anti-Ir75b antibodies is indicated 
with a red bar. b, Immunofluorescence with anti-Ir75b antibodies on 
antennae of reference D. melanogaster (Methods) and RAL707 flies. Scale 
bars, 101m. ¢, Quantification of odour-evoked responses in ac3 sensilla to 
propionic acid and butyric acid in control (reference D. melanogaster) and 
RAL707 flies. Box plots indicate the median and first and third quartile 

of the data; each dot corresponds to one recording. d, Gene structure of 
D. melanogaster Ir31a, indicating the position of the PTC in the RAL441 
strain. e, Quantification of odour-evoked responses to 2-oxopentanoic 
acid in acl sensilla of control (reference D. melanogaster) and RAL441 
flies. f, Gene structure of D. melanogaster Or35a, indicating the position 
of the PTC in the T09 (and T29) strain. g, Quantification of odour-evoked 
responses to octanol in ac3 sensilla of control (reference D. melanogaster) 
and T09 flies. 


selenocysteine (which is incorporated at UGA!®). Moreover, no 
suppressor tRNAs are known in D. melanogaster’? and ribosomal 
frame-shifting is also unlikely because there is no change in the reading 
frame after the PTC. We suggest that read-through is due to PTC rec- 
ognition by a near-cognate tRNA that allows insertion of an amino acid 
instead of translation termination. Although the trans-acting factors 
regulating read-through are unclear, the neuronal specificity of this pro- 
cess is reminiscent of RNA editing and micro-exon splicing, in which 
key responsible regulatory proteins are neuronally enriched”°”!. We 
therefore speculate that tissue-specific expression differences in tRNA 
populations underlie neuron-specific read-through. 

Although it has long been known that viruses display PTC read- 
through”’, case studies in eukaryotes are largely limited to artificial 
scenarios in which nonsense mutations have been introduced through 
random or site-directed mutagenesis~*”», or in human disease-causing 
alleles with low read-through rates”*. Our characterization of four 
protein-coding PTC-containing genes demonstrates that the read- 
through of naturally occurring PTCs can be sufficiently efficient to 
permit the functionality of pseudogenes and the maintenance of these 
variants in populations. This finding further highlights the plasticity 
of translational regulation, allowing for the phenotypic buffering of 
genetic changes””. It should also prompt the experimental examination 
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of the hundreds of PTC-containing presumed pseudogenes, both 
within and beyond chemosensory gene families in insects”*, humans”® 
and other organisms. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


No statistical methods were used to predetermine sample size. These experiments 
were not randomized and the investigators were not blinded to allocation during 
experiments and outcome assessment. 

Molecular biology and transgenesis. D. melanogaster and D. sechellia Ir75a 
ORFs (including the PTC in D. sechellia Ir75a) were cloned into pUAST-attB™. 
Site-directed mutagenesis of D. sechellia Ir75a and D. melanogaster Ir75a, reverse- 
transcription PCR amplification and sequencing of genomic amplicons to 
verify the presence of the D. sechellia Ir75a PTC were performed using standard 
procedures. For the transgene in which the nucleotide 3’ of the PTC was mutated 
(Fig. 3e), we changed codon 215 from CGT—AGA to maintain the identity of 
the encoded amino acid (arginine). Oligonucleotide and plasmid sequences 
are available upon request. New transgenes were integrated in attP40 using the 
phiC31 site-specific integration system by Genetic Services Inc. or BestGene 
Inc. All transgenes were sequence-verified both before and after integration into 
D. melanogaster. 

Drosophila strains. Flies were maintained at 25°C in 12h light:12h dark 
conditions. D. melanogaster wild-type refers to w!"/°, unless noted otherwise. 
D. sechellia wild-type was 14021-0248.25 (Drosophila Species Stock Center, UCSD). 
The sequenced region of Ir75a shown in Fig. 1f was amplified from the following 
strains (from the Drosophila Species Stock Center, unless noted otherwise): 
D. sechellia: 14021-0248.08, 14021-0248.11, 14021-0248.13, 14021-0248.15, 
14021-0248.19, 14021-0248.25, 14021-0248.27, 14021-0248.30; D. simulans: 
14021-0251.195, 14021-0251.196, 14021-0251.197, as well as a Seychelles-isolated 
D. simulans*?, Other published mutant and transgenic lines used were: Ir84a°4/4 
(ref. 34), Ir75a-GAL4 (ref. 11), UAS-CD8:GFP*, Mi{ET 1}Ir75a[MB00253]°°, Df(3L) 
BSC415 (Ir75a deficiency)°’, actinSC-GAL4 (ref. 38), elav-GAL4 (Bloomington 
458), repo-GAL4 (Bloomington 7415), RAL441, RAL707 (ref. 39) and Tasmania 
TO9 (ref. 40). 

Sequence analysis. We downloaded D. sechellia antennal RNA-sequencing 
datasets‘! from the NCBI Gene Expression Omnibus repository (GEO accessions 
GSE67861 and GSE67587; SSR files SRR1952772, SRR1952777, SRR1973487, 
SRR1973490). The sra files were converted to fastq files and remapped to the D. 
sechellia genome (r1.3) using TopHat (v2.0.13;-b2-sensitive). The genomic index 
and splicing index were also generated with TopHat using the D. sechellia gtf (r1.3). 
The resulting bam files were visualized within IGV (v2.3.63), and we manually 
inspected reads that covered the Ir75a PTC. Within these four datasets, ~100% of 
the reads supported the presence of the PTC-causing “T” allele (only 6/1777 reads 
within all four datasets supported an alternative nucleotide, within the noise of 
sequencing errors). 

PTCs in other olfactory receptor genes were identified in the Drosophila 

melanogaster Genetic Reference Panel (DGRP; http://dgrp2.gnets.ncsu.edu/)*? 
and/or the Global Diversity Lines (GDL)**“”. For Ir75b, the following lines contain 
the same PTC: DGRP (RAL181, RAL189, RAL227, RAL320, RAL348, RAL352, 
RAL358, RAL374, RAL379, RAL382, RAL385, RAL395, RAL399, RAL439, 
RAL461, RAL531, RAL596, RAL707, RAL712, RAL716, RAL730, RAL804, 
RAL805, RAL821, RAL855, RAL884), GDL (B10, B11, B12, 126, NO1, N02, N07, 
N25, T29). We re-sequenced the PTC-containing region and confirmed Ir75b 
protein expression in all strains from the GDL (data not shown). For Ir31a, only 
RAL441 contains the PTC. For Or35a, GDL Tasmanian strains T09 and T29 
contain the same PTC. 
Histology and morphological analyses. Immunofluorescence on whole- 
mount antennae or antennal cryosections (used only in Fig. Ic, inset, and 
Fig. 2a), were performed as described!!1*. Affinity-purified antibodies 
were generated by Proteintech Group, Inc., against the following peptides: 
KRSKYGNREQLTDVVLRYV (anti-Ir75a, in rabbits; used at 1:100 for whole- 
mount antennae, and 1:500 for cryosections), and RPLTLSDDELIRFLSQEND 
(anti-Ir75a”, in guinea pigs; used at 1:10) and PDVRDLYRKKVLGSKRSPD 
(anti-Ir75b, in guinea pigs; used at 1:500); these peptides are predicted antigenic, 
surface-exposed sequences conserved in D. melanogaster, D. simulans and 
D. sechellia orthologues. Other antibodies used were (concentrations listed are for 
whole-mount antennal stains, for cryosections, antibodies were used at tenfold 
higher dilutions): guinea-pig anti-Ir8a 1:1,000 (ref. 13), chicken anti-GFP 1:100 
(AbCam), rat anti-Elav 1:10 (7E8A10; Developmental Studies Hybridoma Bank) 
and mouse monoclonal nc82 1:10 (Developmental Studies Hybridoma Bank). 
Alexa488-, Cy3- and Cy5-conjugated goat anti-mouse IgG, goat anti-guinea- 
pig IgG and goat anti-rabbit IgG secondary antibodies (Molecular Probes and 
Jackson Immunoresearch) were used at 1:100, and Alexa488-conjugated donkey 
anti-chicken IgG secondary antibodies were used at 1:500. 

The quantifications in Extended Data Fig. 1 were performed using Image] 
(http://imagej.nih.gov/ij/). In brief, for each antenna a single plane with several 


Ir75a-positive cells was chosen and cropped to exclude background from the 
surroundings. The Ir75a channel was used to create a mask of the cells by using the 
auto-threshold function. This mask was then applied to the composite image (with 
overlapping anti-Ir75a in the red channel and anti-GFP in the green channel). The 
new image was analysed using the ‘color histogram’ tool to give the total number 
of red and green pixels within the masked cells; these values were used to calculate 
the ratio of green to red pixels. 

Electrophysiology. Single sensillum electrophysiological recordings were 
performed essentially as described**> on 2-10 day old animals. The sample sizes 
(n) indicated in the figure legends correspond to biological replicates (different 
sensilla), with a maximum of three sensilla per animal. Exact sample sizes for 
each experimental or group condition are provided in the Source Data. Genotypes 
(not blinded to experimenter) were interleaved to minimise effects of time-of-day 
and animal age. All odour-evoked responses were corrected for solvent-evoked 
spikes. Chemicals were purchased from Sigma-Aldrich and were of the highest 
purity available. Odorants were used at 1% (v/v) in all experiments unless other- 
wise noted in the figure legends. Odour stimulus cartridges (1011 odour dilution 
on ~5 x 5mm Sugi strip placed in a 2-ml plastic syringe) were prepared freshly 
before each recording session; cartridges were interleaved, with a maximum of 
five uses. Stimuli, with CAS number and solvents used in brackets, are as follows: 
1,4-diaminobutane (110-60-1; H2O), 2-oxopentanoic acid (1821-02-9; paraffin 
oil), acetic acid (64-19-7; H2O), butyric acid (107-92-6; HO), hexanoic acid 
(142-62-1; H2O), octanol (111-87-5; paraffin oil), phenylethylamine (64-04-0; 
paraffin oil), propionic acid (79-09-4; H2O), pyridine (110-86-1; paraffin oil). 
Ionotropic receptor decoder neurons are ac4 Ir84a-mutant neurons that lack the 
endogenous ligand-specific Ir84a, but that still express the co-receptor Ir8a**. For 
measurement of odour-evoked responses of Ir75b neurons in ac3 sensilla (Fig. 5c), 
we note approximately half of the analysed sensilla belong to ‘ac3II’ class 
(expressing Ir75c) that are electrophysiologically indistinguishable from ac3I 
sensilla (expressing Ir75b; L.L.P.-G., R.R. and R.B., unpublished data). Thus, if the 
RAL707 PTC-bearing allele is non-functional, we would expect half of the sensilla 
to show no responses, which is not the case. 

Statistical analysis. Sample sizes were fixed before data analysis, based on 
preliminary studies. Data were analysed and plotted using ‘R project’ (http://R- 
project.org). Data were analysed statistically using the Shapiro—Wilk test to assess 
for normality followed by a two-tailed Student's t-test or a Wilcoxon rank-sum test 
as appropriate. When a P value correction for multiple comparisons was needed, 
the Benjamini-Hochberg method was used. Full statistical test results are provided 
in the Source Data files for each figure. 

Cluster analysis of odour responses. Spike-count data (from responses of 
endogenous or ionotropic receptor decoder neurons) were imported to Matlab 
(Mathworks), and standardized using z-scores across each recording. An unbiased 
k-means cluster analysis was performed using the response properties to the 
indicated number of odours. The optimal number of clusters was determined by 
the silhouette method”. The silhouette value is a measure of both how tightly 
a data point is associated with its assigned cluster and how dissimilar it is from 
other clusters, and it peaks at the ‘correct’ number of clusters. If the distribution 
is unimodal (that is, all data fall within one cluster) the silhouette value does not 
peak and is similar for each value of k. In brief, we first ran iteratively the Matlab 
k-means algorithm 100 times for k values between 2 and 6. We then calculated 
the silhouette value for each k-means solution, and subsequently computed and 
plotted the mean silhouette value and its standard deviation of all of the solutions 
within each k value. The results of the clusters from the k-mean analysis were 
plotted in Matlab on the principal component space after performing a principal 
component analysis using the same odour dataset. The scripts used to analyse data 
are available upon request. 

Protein homology modelling. A multiple-sequence alignment of the LBD of 
D. sechellia Ir75a, D. melanogaster antennal ionotropic receptors (Ir8a, Ir2 1a, Ir25a, 
Ir31a, Ir40a, Ir4la, Ir64a, Ir75a, Ir75b, Ir75c, Ir76a, Ir76b, Ir84a, Ir92a, Ir93a)°, 
Rattus norvegicus GluK2 (UniProt ID P42260), Rattus norvegicus GluA2 (P19491), 
Adineta vaga GluR1 (E9P5T5) and Synechocystis PCC 6803 GluRO (P73797) was 
generated by PROMALS3D”. A GT dipeptide sequence was introduced between 
the S1 and $2 domains of the Drosophila proteins to facilitate alignment with the 
linker sequence included in the crystallized GluA2, GluK2, AvGluR1 and GluRO 
LBDs. Alignment of Ir75a proteins was curated according to PSIPRED secondary 
structure predictions**. Models of the D. sechellia Ir75a LBD (V205-T318-GT- 
K434-C579) were built using MODELLER (mod9. 12)” using as templates the apo 
and ligand-bound crystal structures of GluA2 (PDB ID: 1FTO apo state; 1FTM 
ligand bound”). The results of standard MODELLER energy functions, molpdf 
and DOPE, were highly similar between generated models. We illustrate in Fig. 4a 
the model with the lowest DOPE energy function score. 
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Extended Data Figure 1 | Quantification of efficiency and tissue- 


specificity of translational read-through of the D. sechellia Ir75a PTC. 


Quantification of GFP staining in the cell bodies of neurons expressing 
different read-through reporter constructs in different populations 
of OSNs (see Figs 2, 3 for genotypes). GFP fluorescence levels were 


n.s. 


05 1.0 15 2.0 25 
Ratio o-GFP/a-Ir75a fluorescence 


normalized by anti-Ir75a fluorescence levels in the Cy3 channel within 
each analysed cell. Box plots indicate the median and first and third 
quartile of the data. *P < 0.05, ***P < 0.0005, not significant (n.s.) 

P> 0.05 (all data analysed using pairwise Wilcoxon rank-sum test, 
Benjamini—Hochberg correction). 
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actin5C-GAL4 


CAA (*214Q) 


Extended Data Figure 2 | Tissue specificity of translational read- 
through of the D. sechellia Ir75a PTC. Immunofluorescence with 
anti-GFP (green) and the neuron nuclear marker anti-Elav (magenta) on 
whole-mount D. melanogaster antennae in which actin5C-GAL4 drives 
broad expression of D. sechellia Ir75a°7!42:GFP (UAS-DsIr75a‘°7!42:GFP/ 
act5C-GAL4) or Ir75a:GFP (UAS-DsIr75a:GFP/act5C-GAL4). Arrowheads 
indicate examples of GFP-expressing, Elav-negative, non-neuronal cells 
that were observed in 6 out of 6 antennae expressing the control transgene 
lacking the PTC, and in 0 out of 6 antennae expressing the PTC-containing 
transgene. Note that the neuronal GFP signal of both transgenes is 
heterogeneous across the antenna, possibly because of the variable strength 
of driver expression and/or instability of the GFP-tagged receptors in 
heterologous neurons. Scale bars, 101m. 
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Dmel MOLVOLANFVLDNLVOSRIGFIVLFHCWOSDESLKFAQQFMKPIHPILVYHOFVOMRGVLNWSHLELSYMGHTOPTLAI 79 
Dsim MOLVOLANFVLDNLVOSRIGFIVLFHCWOSDESLKFAEQFMKPIHPILVYHOFVOMRGVHNWSHLELNYMGHTOPTLAI 
Dsec MOLVOLANFVLDNLVOSRIGF IVLFHCWOSDESLKFAEQFMKPIHPILVYHOFVOMRGVLNWSHLELNYMGHTOPTLAI 


80 YVDIKCDOTODLLEEASREQIYNQHYHWLLVGNOSKLEFYDLFGLFNISIDADVSYVKEQIQDNNDSVAYAVHDVYNNG 158 
YVDRKCDQAQDLLEEASREQI YNQHYHWLLVGNOSELEFNDLFALFNIS I DADVSYVKEQIQDNNDSVAYAVYDVYNNG 
YVDIKCDQAQDLLEEASREQI YNQHYHWLLVGNOSELEFNDLFALFNIS IDADVSYVKEQIQDNNDSVAYAVYDVYNNG 

Getrag rrsrse gegevens TF BEAP7ioal sterouerovorscusssestesaeccasecoteserorsscse 7 
159 KIIGGQOLNVTGSHEMSCDPFVCRRTRHLSSLOKRSKYGNREQLTDVVLRVATVVTORPLTLSDDELIRFLSQENDTHID 237 
KI SE ee eee D 
KI IGGQLNVTGSHEMSCDPFKCRRTRYLSSLOKRSKYGNREQLTDVVLRVATVVTXRPLTLSDDELIRFLSQENDTHID 


238 SLARFGFHLTLILRDLLHCKMKF IFSDSWSKSDVVGGSVGAVVDQTADLTATPSLATEGRLKYLSAIIETGFFRSVCIF 316 
SLARFGFHLTLILRDLLHCKMKFIFSDSWSKSDVVGGSVGAVVDOTADLTATPSLATEGRLKYLSAIIETGFFRSVCIF 
SLARFGFHLTLILRDLLYCKMKF IF SDSWSKSDVVGGSVGAVVDQTADLTABPSLATEGRLKYLSAIIETGFFRSVCIF 


317 RTPHNAGLRGDVFLOPFSPLVWYLFGGVLSLIGVLLWITFYMECKRMQKRWRLDYLPSLLSTFLISFGAACIQSSSLIP 395 
RTPHNAGLRGDVFLOPFSPLVWYLFGGVLSLIGVLLWITFYMECKRMQKRWRLDYLPSLLSTFLISFGAACIQSSSLIP 
RTPHNAGLRGDVFLOPFSPLVWYLFGGVLSLIGVLLWITFYMECKRMOQKRWRLDYLPSLLSTFLISFGAACIOSSSLIP 

$2 

396 RSAGGRLIYFALFLISFIMYNYYTSVVVSSLLSSPVKSKIKTMROLAESSLTVGLEPLPFTKSYLNYSRLPEIHLFIKR 474 
RSAGGRLIYFALFLISFIMYNYYTSVVVSSLLSSPVKSKIKTMOQQLAESSLTVGLEPLPFTKSYLNYSRLPEIHLFIKR 
RSAGGRLIYFALFLISFIMYNYYTSVVVSSLLSSPVKSKIKTMOOLAESSLTVGLEPLPFTKSYLNYSRLPEIHLFIKR 


475 KIESQTONPELWLPAEQGVLRVRDNPGYVYVFETSSGYAYVERYFTAQEICDLNEVLFRPEQLFYTHLHRNSTYKELFR 553 
KIESQTONPELWLPAEQGVLRVRDNPGYVYVFETSSGYAYVERYFTAQEICDLNEVLFRPEQLFYTHLHRNSTYKELFR 
KIESQTONPELWLPAEQGVLRVKANPGYVYVFETSSGYAYVERYFTAQEICDLNEVLFRPE Ly THLHRNSTYKELFR 


554 LRFLRILETGVYRKORSYWVHMKLHCVAQNFVITVGMEYVAPLLLMLICADILVVVILLVELAWKRFFTRHLTFHP 629 

LRFLRILETGVYRKQRS YWVHMKLHCVAONFVITVGMEYVAPLLLMLICADILVVVILLVELAWKRFFTRPLTFHP 

LRFLRILETGVYRKORS YWVHMKLHCEAONFVITVGMEYVAPLLLMLICADILVVVILLVELAWKRFFTRPLTIHP 
Extended Data Figure 3 | Alignment of drosophilid Ir75a orthologues. species. Pink and red shading represents D. sechellia-specific amino acid 
Protein-sequence alignment of D. melanogaster, D. simulans and D. changes within the LBD; red denotes the subset located in the internal 
sechellia Ir75a. Blue bars indicate the $1 and S2 lobes of the predicted LBD. _ cavity of the binding pocket (Fig. 4a). The locations of the peptide epitopes 
The position of the PTC (X) is highlighted in yellow. Dark grey columns for the Ir75a antibodies are highlighted with green dashed boxes. 
in the alignment highlight amino acids conserved only in two of the three 
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1970s and ‘Patient 0’ HIV-1 genomes illuminate 
early HIV/AIDS history in North America 


Michael Worobey!, Thomas D. Watts!, Richard A. McKay’, Marc A. Suchard’, Timothy Granade’, Dirk E. Teuwen’, 
Beryl A. Koblin®, Walid Heneine*, Philippe Lemey’ & Harold W. Jaffe* 


The emergence of HIV-1 group M subtype B in North American men 
who have sex with men was a key turning point in the HIV/AIDS 
pandemic. Phylogenetic studies have suggested cryptic subtype B 
circulation in the United States (US) throughout the 1970s!” and an 
even older presence in the Caribbean”. However, these temporal and 
geographical inferences, based upon partial HIV-1 genomes that 
postdate the recognition of AIDS in 1981, remain contentious** and 
the earliest movements of the virus within the US are unknown. We 
serologically screened >2,000 1970s serum samples and developed 
a highly sensitive approach for recovering viral RNA from degraded 
archival samples. Here, we report eight coding-complete genomes 
from US serum samples from 1978-1979—eight of the nine 
oldest HIV-1 group M genomes to date. This early, full-genome 
‘snapshot reveals that the US HIV-1 epidemic exhibited extensive 
genetic diversity in the 1970s but also provides strong evidence for 
its emergence from a pre-existing Caribbean epidemic. Bayesian 
phylogenetic analyses estimate the jump to the US at around 1970 
and place the ancestral US virus in New York City with 0.99 posterior 
probability support, strongly suggesting this was the crucial hub 
of early US HIV/AIDS diversification. Logistic growth coalescent 
models reveal epidemic doubling times of 0.86 and 1.12 years for the 
US and Caribbean, respectively, suggesting rapid early expansion 
in each location*. Comparisons with more recent data reveal many 
of these insights to be unattainable without archival, full-genome 
sequences. We also recovered the HIV-1 genome from the individual 
known as ‘Patient 0’ (ref. 5) and found neither biological nor 
historical evidence that he was the primary case in the US or for 
subtype B as a whole. We discuss the genesis and persistence of this 
belief in the light of these evolutionary insights. 

No comprehensive genomic analysis of the emergence and early 
spread of HIV-1 in North America—where HIV/AIDS was first 
recognized—has been possible because the only pre-1980 HIV-1 
group M genome currently available (strain Z321B) was sampled 
in Africa. To fill this gap, we performed serological screening 
and viral genome sequencing of archived serum samples 
dating back to 1978-1979, originally collected from men who 
have sex with men (MSM) cohort patients in New York City 
(NYC) and San Francisco (SF). NYC samples were from volun- 
teers in a prospective study of AIDS established in 1984 (ref. 6), 
378 of whom had been part of an earlier cohort of 8,906 men 
involved in hepatitis B virus (HBV) studies beginning in 1978 (ref. 7), 
and for which stored sera from 1978 and/or 1979 were available’. 
Previous work showed that 6.6% of these sera from NYC in 1978-1979 
were HIV-1 seropositive®; 33 of these positive samples were chosen 
for attempted HIV-1 sequencing. The sera from SF originated from 
a study of approximately 6,875 patients enrolled in the late 1970s in 
HBV studies at the San Francisco City Clinic’. We tested 2,231 of these 
samples from 1978 and found, by western blot, that 83 (3.7%) were 


positive for HIV-1 antibodies; of these, 20 were randomly chosen for 
attempted HIV-1 sequencing. 

Low template number and degradation arising from long-term stor- 
age were major challenges for genomic analysis, as encountered previ- 
ously with similar samples’®: recovered RNA was generally below the 
limits of quantification and initial attempts at amplification of reverse- 
transcribed viral RNA failed consistently and indicated that viral RNA 
survived in the 1970s samples only in short fragments. This led us to 
design an RNA ‘jackhammering’ approach to greatly increase both the 
ability to detect viral RNA-positive samples and to recover complete 
genomic HIV-1 sequences from them. Briefly, we used large panels of 
primers to amplify many short fragments in separate pools, such that 
amplicons overlapped between but not within each pool (Extended 
Data Fig. 1 and Supplementary Table 1). Each pool’s amplicon set filled 
gaps between those of complementary pools, with the entire panel pro- 
viding complete genomic coverage. Moreover, a preliminary, multiplex 
amplification step greatly concentrated target RNA before final ampli- 
fication and sequencing. 

Three samples from SF and five from NYC provided sufficient data to 
assemble coding-complete sequences. Bayesian phylogenetic analyses 
of these HIV-1 genomes (Fig. 1 and Extended Data Fig. 2) showed 
that although they were the oldest sampled outside Africa, they do 
not fall on the deepest branches even within subtype B. Instead, the 
1970s genomes and the US epidemic as a whole were phylogenetically 
nested within the more genetically diverse, older subtype B epidemic 
in Caribbean countries. Separate analyses of gag, pol and env sequences 
also placed the US sequences in a strongly supported monophyletic 
clade nested within the paraphyletic Caribbean subtype B sequences 
from Haiti, Dominican Republic, Jamaica, Trinidad and Tobago and 
Haitian immigrants in the US (Extended Data Figs 3, 4). Molecular 
clock phylogeographic analysis of the complete genome data sup- 
ported a subtype B ancestor in the Caribbean (posterior probability 
>0.99) dating to 1967 (95% credibility interval 1963-1970) (Extended 
Data Table 1). This provided genome-wide evidence that the epidemic 
moved from the Caribbean to the US rather than from the US to the 
Caribbean’. 

Location transition estimates recovered a relatively precise date (1971 
(1969-1973), Extended Data Table 1) for the HIV-1 jump from the 
Caribbean, very shortly before the US most recent common ancestor 
(MRCA). This narrow timing is aided by the basal relationship of a 
very close relative from the Caribbean (sequence ‘H6’ from an individ- 
ual who entered the US from Haiti in 1981)? (Extended Data Fig. 2). 
The probability density of the date of introduction to the US over- 
laps with the deep branching structure in Caribbean diversity (Fig. 1 
and Extended Data Fig. 3), indicating that the US clade emerged from 
the Caribbean epidemic during its early growth phase. We estimated 
a relatively fast logistic growth rate of 0.62 (0.26-0.99) yr~' within the 
Caribbean population (Fig. 2). That of the US population was even 
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Figure 1 | Maximum clade credibility (MCC) tree summary of the 
Bayesian spatio-temporal reconstruction based on complete HIV-1 
genome data. The tips of the tree correspond to year of sampling while 
branch and node colours reflect the sampling locations for the tip branches 
and the inferred locations for the internal branches. Tip labels are provided 
for the newly obtained archival HIV-1 genomes. Diameters of internal 
node circles reflect posterior location probability values. Thick outer 
circles represent internal nodes with posterior probability support 

>0.95. We also depict the posterior probability density (grey) for the 

time of the introduction event from the Caribbean into the US on the 

time scale of the tree. A fully annotated tree for this data set (‘full genome 
38°, which includes only sequences sampled early in the US epidemic) 

is shown in Extended Data Fig. 2b; ‘full genome 46’ which includes all 
available complete genomes basal to the ‘pandemic clade” of subtype B, 
plus a similar number and date range of US pandemic clade sequences, is 
shown in Extended Data Fig. 2a. Separate analyses of gag, pol, env, and the 
coding-complete genomes (including also sequences sampled later in the 
US epidemic) provide consistent results (Extended Data Figs 3, 4). 


higher, 0.81 (0.65-0.98) yr~‘, in line with a precipitous spread among 
existing high-risk sexual networks. These mean growth rate estimates 
corresponded to doubling times of 1.12 years and 0.86 years for the 
Caribbean and the US, respectively; both the more rapid and longer 
growth in the US appear to have contributed a higher number of 
‘effective infections’ (Fig. 2), with the US overtaking the Caribbean by 
around 1977 despite the later HIV-1 emergence in the US. 

Molecular clock analyses of larger numbers of env sequences revealed 
similar time of the most recent common ancestor (TMRCA) estimates 
for the key nodes (Fig. 3, Extended Data Table 1 and Extended Data 
Figs 5, 6). Interestingly, our modest snapshot of 1970s sequences from 
NYC and SF (Fig. 3, Extended Data Fig. 5b) encompassed the full diver- 
sity exhibited by HIV-1 sequences from later years (that is, it shares the 
same MRCA as larger sequence sets sampled in later years): all post-1985 
US sequences are nested within the early diversity captured by the lim- 
ited number of 1970s sequences we recovered (Extended Data Fig. 6). 

A phylogeographic reconstruction including only those US 
sequences sampled from known locations between 1978-1984 (Fig. 1) 
demonstrated that the NYC epidemic was already relatively mature 
and genetically diverse by 1979, tracing back to an MRCA estimated at 
1972 (1970-1974) and there is strong support for the idea that the US 
subtype B ancestor circulated in NYC (posterior probability = 0.99). 
Indeed, the extensive genetic diversity in the US (and in NYC in 
particular) in 1978-1979 can be explained only by several years of 
circulation of the virus before 1978-1979. 

Using sequences sampled from NYC, North Carolina and California 
relatively late in the epidemic (comparable to the 1978-1984 East coast, 
West coast and Southern sampling), we still inferred a US ancestor in 
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Figure 2 | Demographic reconstruction based on the nested coalescent 
model. The colour scheme is consistent with that of the phylogeographic 
analyses in Figs 1 and 3: the constant-logistic population size estimates 
(the ‘effective number of infections, N., multiplied by the mean viral 
generation time, 7) through time are depicted in a black-yellow 

colour range (following the African and Caribbean locations in the 
phylogeographic analyses) while the logistic population size estimates for 
the nested US clade are shown in blue (as for the US/NYC location in the 
phylogeographic analyses). 


NYC, but with only modest support that prevents us from drawing 
firm conclusions (pp = 0.67, Extended Data Figure 6b and Extended 
Data Table 1). As a generality, early samples close to the deep branching 
structure are essential to confidently reconstruct the initial spatio- 
temporal expansion dynamics in exponentially growing populations. 

Compared to NYC, the SF epidemic in 1978 appeared to have been 
established more recently (Figs 1, 3, and Extended Data Figs 2b, 5b). 
It is striking that all three independently detected complete HIV-1 
genomes we found are so closely related; moreover, they form a cluster 
with three partial env sequences sampled in SF during the same 
period!” (Extended Data Fig. 5b). This suggests that the bulk of the 
HIV-1 infections in SF in 1978 traced back to a single introduction 
from NYC in around 1976 (consistent with the lower HIV-1 seroprev- 
alence in the SF cohort). 

The sampled sequences thus reveal a series of key founder events in 
the genesis of subtype B (for example, Fig. 3 and Extended Data Table 1), 
with the epidemic spreading from the African HIV-1 group M epicentre 
to the Caribbean by about 1967, from the Caribbean to NYC by about 
1971 and from NYC to SF by about 1976, quickly followed by extensive 
geographical mixing in the US and beyond. 

Reports of one cluster of homosexual men with AIDS linked 
through sexual contact were important in suggesting the sex- 
ual transmission route of an infectious agent before the iden- 
tification of HIV-1 (refs 5, 11). Beginning in California, CDC 
investigators eventually connected 40 men in ten American cities 
to this sexual network. Investigators placed one man with Kaposi’s 
sarcoma near the centre of a sociogram representing this cluster 
and identified him as ‘Patient 0’—a ‘non-Californian AIDS patient’ 
and a possible ‘carrier’ of an infectious agent (Extended Data Fig. 7). 
Before publication, Patient ‘O’ was the abbreviation used to indicate that 
this patient with Kaposi’s sarcoma resided ‘Out(side)-of-California’ As 
investigators numbered the cluster cases by date of symptom onset, the 
letter ‘O’ was misinterpreted as the number ‘0; and the non-Californian 
AIDS patient entered the literature with that title”. Although the authors 
of the cluster study repeatedly maintained that Patient 0 was probably 
not the ‘source’ of AIDS for the cluster or the wider US epidemic, many 
people have subsequently employed the term ‘patient zero’ to denote an 
original or primary case, and many still believe the story today'*. We 
therefore recovered the complete HIV-1 genome of Patient 0 and exam- 
ined it against the backdrop provided by the 1970s sequences. 

Although labelled as the cluster study’s ‘index patient, Patient 0 
was neither the first AIDS case to come to CDC researchers’ 
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Figure 3 | The early patterns of HIV-1 subtype B spread in the 
Americas. The map summarizes the main patterns of spread inferred 
from the molecular clock phylogeographic analyses. The map inset shows 
the initial introduction of the subtype B lineage into the Caribbean from 
Africa. From there, the virus spreads first to NY and subsequently to 
different locations in the United States. The tree depicts the US clade, plus 
the most closely related basal Haiti strain, as inferred from the env 74 
analysis (Extended Data Fig. 5b). Tips of the clade correspond to the year 
of sampling. Tip branch colours reflect the actual sampling locations as 
indicated on the map; interior branches depict phylogenetically inferred 
locations using the same colour scheme. Diameters of internal node circles 


attention, nor the first to display symptoms. In general, the CDC 
numbered cases in the order that the reports reached the agency 
from different cities and employed the terms ‘cases’ and ‘patients’ 
interchangeably. Patient 0, until he was linked to the cluster and 
took on his new name, was Case (or Patient) 057. The cluster study’s 
LA 6 was the CDC’s Case 032, and several cases in the New York 
section of the cluster® (Extended Data Fig. 7) were also reported 
before Patient 0 (and thus brought to investigators’ attention first): 
NY 3 was Case 001, NY 2 was Case 002, NY 6 was Case 010 and NY 
5 was Case 053 (ref. 14). 

The information available for CDC investigators to establish symp- 
tom onset dates was often fragmentary and thus resisted uniform 
categorization. Sometimes onset was determined on the basis of lym- 
phadenopathy, other times by the appearance or diagnosis of Kaposi’s 
sarcoma or Pneumocystis carinii pneumonia. Investigators were 
unable to link to the cluster several NYC-based cases that had much 
earlier dates of symptom onset. For example, Case 154 was a middle- 
aged European man whose reported onset date for Kaposi’s sarcoma 
was January 1975; and Case 153, when he was diagnosed with Kaposi's 
sarcoma in September 1981, recalled having swollen glands as early 
as June 1977 (ref. 15). Yet even within the cluster, Case 057’s symp- 
toms (lymphadenopathy in December 1979 and a Kaposi’s sarcoma 
lesion diagnosed in May 1980, ref. 5) appeared considerably later than 
those of several other cases. LA 1 (Case 335) developed a lesion in 
February 1978 (ref. 16), whereas NY 1 (Case 152) experienced the onset 
of Kaposi’s sarcoma in December 1978, NY 2 (Case 002) in May 1979 
and NY 3 (Case 001) in August 1979 (ref. 14). 

In his book And the Band Played On, Randy Shilts identified ‘Patient 
Zero by name as a highly sexually active French-Canadian flight 
attendant!”. Unlike the initial reports of the cluster, media coverage of 
Shilts’s book strongly insinuated that this individual was the source of 
the North American epidemic and an exemplar of dangerous disease 
transmission'*—ideas which found a global audience (Supplementary 
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based on the env 74 analysis. Date next to arrow between these locations 
represents the estimated timing of the corresponding jump. Patient 0 
(represented by two sequences) and the earliest sequences from San 
Francisco (1978) and New York City (1979) are labelled. Maps made with 
Natural Earth. 


Discussion). However, we found that the HIV-1 genome from this indi- 
vidual appeared typical of US strains of the time and was not basal to 
the US diversity, let alone to the deeper Caribbean subtype B diversity, 
in a manner that might be suggestive of a special role (Figs 1, 3). In 
short, we found no evidence that Patient 0 was the first person infected 
by this lineage of HIV-1. 

In addition to donating plasma for analysis, Patient 0 provided 
investigators with the names of nearly 10% of his sexual partners 
over several years®, while many other cluster patients were una- 
ble to share more than a handful of names'®. This strongly sug- 
gests that ascertainment bias contributed to his central role in the 
cluster study and its diagrammatic representation. Later research 
would also call into question the cluster study’s estimated average 
latency period of 10.5 months between sexual contact and symp- 
tom onset, with a revised average incubation period approaching 
10 years for MSM. In retrospect, the study’s sociogram (Extended 
Data Fig. 7) almost certainly depicted the sexual contacts of these 
men years after they had contracted HIV-1 (ref. 19) (Supplementary 
Discussion). Other East coast HIV-1 sequences fall much closer to 
the main early-California clade we identify than does that of Patient 0 
(Fig. 3). Thus, while he did link AIDS cases in New York and Los 
Angeles through sexual contact, our results refute the widespread mis- 
interpretation that he also infected them with HIV-1. 

Much like historical reconstructions, phylogenetic inferences 
are often generated from data collected long after the critical events 
occurred. Our work highlights the importance of complete viral 
genomes from early archival specimens, carefully contextualized 
through historical analysis, without which this detailed picture of 
these early landmarks in the HIV/AIDS pandemic would not have 
been possible. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


No statistical methods were used to predetermine sample size. The experiments 
were not randomized and the investigators were not blinded to allocation during 
experiments and outcome assessment. 

HIV-1 serological screening of serum samples from San Francisco from 
1978. We tested 2,231 samples collected from the MSM cohort in San Francisco 
in 1978 (ref. 9) and detected 83 HIV-1-positives by Western Blot (3.7% preva- 
lence). Samples were first screened by GS HIV-1/HIV-2 Plus O EIA (Bio-Rad 
Laboratories) and reactive samples were further tested by WB Genetic Systems 
HIV-1 Western Blot (Bio-Rad Laboratories). 

HIV-1 nucleic acid amplification. A total of 33 samples of frozen serum from New 
York City previously identified as positive for antibody to HIV-1°* were assayed; 
and a total of 20 frozen serum samples from San Francisco’, identified as part of 
the present study as positive for antibody to HIV-1, were assayed. The New York 
City samples were from 1978 and 1979 though no complete genomic sequences 
from 1978 were developed. The San Francisco samples were all from 1978. RNA 
recovered from samples from both NY and SF was generally undetectable when 
assaying 5-1] aliquots in a Qubit 2.0 fluorometer using the Qubit RNA HS reagents 
(detection limit, 250 pg”). 

Additionally, a sample of peripheral blood mononuclear cells (PBMCs) and a 
sample of serum were both assayed; these had been collected from a single indi- 
vidual in 1983 (Patient 0), and the samples were stored at CDC Atlanta. Other than 
Patient 0, now deceased, the data recorded were unlinked to individual identifiers 
and the work was approved by the Human Subjects Protection Program at the 
University of Arizona. 

Four panels of degenerate primers (Supplementary Table 1 and Extended Data 
Fig. 1) were designed using a suite of North American subtype B sequences. We 
aimed to design primers able to amplify both conserved regions and predictably 
variable sites. Primers within each panel were designed to generate sequence from 
the 5’ end of gag to the 3’ end of nef and were designed to amplify overlapping frag- 
ments. Two panels ‘HIVE (m= 25) and ‘HIVLb (n= 22) were designed to amplify 
fragments of approximately 500-650 bases in length. Two other panels ‘“HIVM’ 
(n=50) and ‘HIVR (n= 46) were designed to amplify fragments of approximately 
200-320 bases in length. 

Nucleic acids from 100-1 aliquots of serum (or PBMCs in the case of Patient 0) 
were isolated using the QlAamp viral RNA mini kit (Qiagen) with 5 mcg added 
carrier RNA. Serum samples were then treated with DNase I (Invitrogen, Life 
Technologies) before reverse transcription. PBMC nucleic acids were left untreated. 

Proviral DNA from Patient 0’s PBMCs was amplified with all four primer panels 

and from multiple separate isolations. Amplification was achieved using Invitrogen 
platinum Taq DNA polymerase high fidelity (Life Technologies) and run for 
55 cycles at an annealing temperature of 52°C. Additionally, attempts were made 
to amplify longer fragments using PCR supermix high fidelity (Life Technologies) 
and forward and reverse primers matched from the HIVLb primer panel for long 
fragment length followed by nesting with primers for slightly shorter fragment 
length. A single fragment of slightly more than 7,000 bases was generated after mul- 
tiple attempts with multiple primer combinations and cloned using the Invitrogen 
TOPO XL PCR cloning kit (Life Technologies). Fragments of individual clones 
were then amplified using HIVLb forward and reverse primers matched to give 
approximately 1,000-base overlapping fragments and then sequenced. 
RNA jackhammering. RNA jackhammering of the serum samples proceeded 
as follows: aliquots of RNA extract were reverse transcribed using the GoScript 
reverse transcription system (Promega) using a program of 4 cycles of 50°C for 
30 min followed by 55°C for 30 min and a final incubation at 85°C for 10 min. 
Primers used were pools of reverse primers from widely spaced amplicons 
(Supplementary Table 1, Extended Data Fig. 1), typically nine or ten primers per 
pool in a single reaction tube, with the wide spacing abrogating the possibility of 
incorporation of an internal primer into any given amplicon. Reverse transcription 
products were then briefly amplified in multiplex reactions in the pool-specific 
tube (denaturation for 3 min at 94°C followed by 30 cycles of 94°C for 30s, 52°C 
for 30s, 68°C for 30s, and a final extension of 68°C for 5 min) with matching 
forward primer pools (a ‘preliminary amplification step). Sequences were then 
amplified from individual aliquots taken from the pool-specific tubes, via single 
primer pairs (denaturation for 3 min at 94°C followed by 40 cycles of 94°C for 
30s, 52°C for 30s, 68°C for 30s, and a final extension of 68°C for 5 min). Two 
separate isolates were amplified from each sample in this manner, with a minimum 
of one amplification with each primer panel per isolate. Five out of the 33 (15%) 
of the NY sera assayed yielded complete HIV-1 genomic data as did 3 out of the 
20 (15%) SF sera, suggesting that levels of viral RNA preservation were very similar 
in each collection. 

In Extended Data Fig. 1 we schematically illustrate the RNA jackhammering 
approach and its advantages over standard RT-PCR procedures for degraded, low 
input samples. For a conventional RT-PCR approach with a fairly long amplification 


product we would perform reverse transcription and obtain one potentially ampli- 
fiable cDNA product. We would then aliquot ~10% of the reverse transcription 
product for amplification in a PCR reaction with forward and reverse primers. Even 
if the single CDNA product made it into the PCR reaction, the desired amplification 
product would be too long and a PCR amplicon would therefore not be obtained. 
For RT-PCR with a shorter amplification product, more appropriately sized given 
the damaged RNA in the sample, there was still a 90% chance that it would be 
deemed a negative sample since most aliquots will not contain the rare cDNA 
product. Using multiple primer sets would increase the chance of a PCR-positive 
result, but most PCR reactions remained negative because most aliquots lack target 
cDNA. Even with a 10 primer-pair pool and 10 final PCR reactions, there may 
be no amplified product. The RNA jackhammering approach targets large panels 
of appropriately short amplicons, uses discrete pools of non-overlapping 
primers pairs for reverse transcription, and includes a crucial multiplex pre- 
amplification step to ensure that each aliquot contained ample template molecules 
for the final PCR amplification (a separate reaction for each primer pair in the 
entire panel). 

Sequencing was performed at the University of Arizona Genetics Core using an 
ABI 3730XL. The Patient 0 sample contained considerable heterogeneity (mixed 
bases) both in proviral assembly and in viral RNA amplification. Heterogeneity in 
the NY and SF samples (all sequences derived from viral RNA) was low. In all cases 
consensus sequences were used in the phylogenetic analyses. Primer sequences 
were computationally removed from all sequence data before assembling genomic 
consensus sequences, which yielded coding-complete genomic data with exception 
of a few small gaps and the 3’ end of the nef gene (Supplementary Table 2). 
Validation of the jackhammering approach. To validate this approach we 
obtained seed stock samples from the NIH AIDS Reagent program of subtype B 
viruses from the US (US657) and Haiti (HT599) and applied a jackhammering 
approach with independent runs of both the HIVM and HIVR primer panels 
(Extended Data Fig. 8). 

For US657 we recovered, in total, from both runs combined, 8,194 nt of high 
quality data. HIVM and HIVR are independent runs with completely different 
primer sets, yet where the data overlapped, they were >99.9% similar. Moreover, 
the few heterogeneities did not line up with heterogeneous primers but fell in 
regions between primers, demonstrating that differences could not be attributed to 
the incorporation of primers into the recovered sequences. This was expected both 
because the wide spacing of amplicons within a single pool of primer pairs pre- 
vents incorporation of primers within amplified products and because all primer 
sequences from final amplification products were computationally removed from 
the sequences before assembly of genomic sequences. There are 3,354 bases in 
the published US657 sequence. Our data covered about 90% of the 3,354 bases of 
previously published US657 sequence (GenBank accession number U04908) and 
all of our individual amplicons in the region of overlap had US657 as the highest 
BLAST hit and were >99% similar to the published sequence. 

For HT599 the HIVM and HIVR primer panels developed 8,545 nt of data, 
99.6% of the target. The HIVM-derived sequence was >99.9% similar to the 
HIVR-derived sequence. We recovered 100% of the overlap with the previously 
published HT599 sequence (2,881 nt, GenBank accession number U08447) with 
99.5% similarity. 

To evaluate discrepancies between the jackhammering-recovered sequences 
and both US657 and HT599, we compared consensus sequences of combined 
HIVM and HIVR data with the respective published sequences by adding them 
to our complete genome alignment and reconstructing a maximum likelihood 
tree (Extended Data Fig. 8a). As expected, the independently generated sequences 
from each virus clustered very closely and only had short tips from their common 
ancestors, resulting from a very small number of substitutions in their overlapping 
regions. In a root-to-tip analysis (Extended Data Fig. 8b), our sequences (with a 
target symbol) were associated with somewhat smaller residuals than the published 
sequences (with a circle), indicating that our data are likely to be more accurate 
and, importantly, cannot contain primer remnants as this would result in much 
larger residuals. 

Sequence data. To construct the data sets for the analyses shown in Fig. 1 and 
Extended Data Figs 2-4 we searched the Los Alamos National Laboratories 
(LANL) HIV database (http://hiv.lanl.gov/) for all available genome-length HIV-1 
sequences from Caribbean countries, which had previously been shown to exhibit 
diverse subtype B lineages that fall basal to a monophyletic ‘pandemic clade of 
subtype B that accounts for most US and other non-Caribbean subtype infections’. 
These included sequences sampled in Haiti, Dominican Republic, Jamaica and 
from Haitians who had recently immigrated to the US from Haiti ((H3’ and ‘HS’ 
from 1982, ‘H6’ and ‘H7’ from 1983, ‘RF_HAT” from 1983)*. For sequences H3, 
H5, H6 and H7 pol sequences were not available, but partial gag and full-length 
env sequences were available. For the full-genome analyses the pol gene was treated 
as missing data. We then added a similar number of genomes from the US from 
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a similar time period (1982-2005), plus one each from France and the UK, as 
well as outgroup sequences of subtype D from the Democratic Republic of the 
Congo (D.R.C.). We called this the ‘full genome 46’ data set because it contained 
46 genomes. The gag, pol and env data sets depicted in Extended Data Fig. 3 were 
each derived from the respective sub-genomic region of this same set of taxa. The 
subset of ‘full genome 46’ that contained only those US sequences sampled from 
1978-1984 we called ‘full genome 38° 

For the env analyses in Fig. 3 and Extended Data Fig. 5 the alignment from 
ref. 2 was used, with the addition of the sequences generated for the present study, 
additional Caribbean subtype B sequences from 2000 to 2005, and four early sub- 
type B partial env sequences from San Francisco’. This alignment we called ‘env 
105. The subset that contained only those US sequences sampled from 1978-1984 
we called ‘env 74. 

For Extended Data Fig. 6 we added to ‘env 105’ a comparable number—relative 
to those sampled from 1978-1984 from known locations (New York, California, 
Georgia, Pennsylvania, New Jersey) (Extended Data Fig. 4b)-of randomly sampled 
sequences from 1997-2007 from NY, SE, and North Carolina (the closest available 
site with sufficient numbers to stand in for the Georgia ones from the 1978-1984 
sample). We called this alignment “env 133. 

In all cases sequences were manually aligned using Se-Al (http://tree.bio. 
ed.ac.uk/software/seal/). All sequence alignments, input files, tree files and 
primer sequences are available at the Dryad Digital Repository (doi:10.5061/ 
dryad.7mv7v). 

Recombination analysis and maximum likelihood tree reconstruction. 
Maximum likelihood phylogenies were reconstructed using RAxML under on a 
general time-reversible model of substitution with gamma distributed rate varia- 
tion among sites”. Bootstrap support values were calculated using 1,000 pseudo- 
replicates. To detect the presence of recombination, we first performed the Phi 
test”! on every data set (Extended Data Table 1). When the null hypothesis of 
absence of recombination was rejected (P < 0.05), we subsequently analysed the 
data set using RDP4 (ref. 22) and produced new alignments in which the minor 
recombinant regions were deleted from putative recombinants. Re-analyses of these 
‘recombination-free’ data sets using the Phi test confirmed the absence of detectable 
recombination signal (P > 0.05, Extended Data Table 1). 

Bayesian phylogenetic inference. Time-measured phylogeographic histories were 
reconstructed using a Bayesian phylogenetic inference approach implemented in 
BEASTV1.8.2 (ref. 23). Our full probabilistic model combined sequence substitu- 
tion over an unknown phylogeny calibrated in time units using a molecular clock 
process with dated tips“, a coalescent tree prior and a discrete diffusion process 
among discrete location states”’. For the sequence substitution process, we used the 
same model as for the maximum likelihood reconstructions. We accommodated 
rate variation among lineages using a lognormal distribution in an uncorrelated 
relaxed molecular clock model”® and integrated out each sampling date over an 
uncertainty interval of one year. Visual inspections of root to tip divergence as a 
function of sampling time using TempEst” indicated a strong temporal signal with 
no clear outlier sequences (Extended Data Fig. 9). 

For most analyses, we flexibly modelled changes in effective population size 
through time by specifying a Bayesian skygrid non-parametric tree prior with a 
grid of 50 years and yearly effective population size parameters”*. (The notion of 
‘effective population size, or ‘effective infections’ in epidemiological applications, 
comes from population genetics, and is typically lower than the full (that is, census) 
population size, reflecting, for example, variance in reproductive success among 
individuals—transmissions to new hosts in this context). To estimate viral popu- 
lation growth rates in both the Caribbean and US populations, we fitted a ‘nested’ 
coalescent model to the data set with the largest taxon sampling (env 133). This 
model fits a constant-logistic demographic function”? to the genealogy excluding 
the US clade. The initial constant phase was included in the model to accommodate 
the deep branching between the subtype B sequences and the African subtype D 
outgroup sequences. Nested within this model, a separate logistic growth model 
was fitted to the US clade in the genealogy. 

The process of discrete diffusion among locations was modelled using a general 
non-reversible substitution model*. In our analyses including the African subtype D 
outgroup lineages, we set the root state frequency to one for the African state and 
zero for all other possible discrete states. We obtained estimates of the transitions 
among locations (Markov jumps) using a stochastic mapping implementation 
capable of inferring the complete Markov jump history*!”. We approximate the 
posterior distribution for our full probabilistic model using Markov chain Monte 
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Carlo (MCMC) sampling. We use BEAGLE in conjunction with BEAST to improve 
the computational performance of our analyses**. MCMC chains were run for 
50,000,000 generations, sampling every 5,000 generations. We diagnosed the runs 
by examining trace plots and effective samples sizes, and summarized continuous 
parameters (mean and 95% highest posterior density (HPD) intervals) using Tracer 
(http://tree.bio.ed.ac.uk/software/tracer/) after discarding a 10% burn-in. Trees 
were summarized as maximum clade credibility trees using TreeAnnotator and 
visualized in FigTree (http://tree.bio.ed.ac.uk/software/figtree/). 

In two specific phylogeographic analyses, we assessed (i) to what extent 
sequences sampled early in the US epidemic characterize the subtype B diversity 
in the US clade (Extended Data Fig. 6a) and (ii) to what extent the location state 
at the origin of the US clade can be estimated using sequences sampled later in the 
epidemic from three different US states (Extended Data Fig. 6b). For this purpose, 
we first reconstructed time-measured phylogenies for the env 133 data set using 
the substitution model, molecular clock model and coalescent model described 
above and subsequently reconstructed ancestral locations on the inferred posterior 
distribution of trees. 

For Extended Data Fig. 6a, we classified US sequences as ‘early’ or ‘late’ 
depending on whether they were sampled before or after (and including) 1985. 
For Extended Data Fig. 6b, we first pruned the necessary US sequences from the 
posterior distributions in order to retain only ‘late’ sequences from New York, 
North Carolina and California (matching the sampling from New York, Georgia 
and California in Fig. 3 and Extended Data Fig. 5b). In this case, the support for a 
NYC ancestral state is likely upheld by the presence of two basal NYC representa- 
tives, but location estimates in a star-like tree structure with long tip branches will 
be critically dependent on how well the diversity of any location is represented in 
the contemporaneous sampling, as recently noted*, 

Comparison of phylogeographic estimates before and after deleting minor 
recombinant regions from putative recombinants (Extended Data Table 1) indi- 
cated highly consistent results. 

Data availability. All sequence alignments, input files, tree files and primer 
sequences are available at the Dryad Digital Repository (doi:10.5061/dryad. 
7mv7v). 

The HIV-1 sequences reported here have been deposited in GenBank under 
accession numbers KJ704787, KJ704788, KJ704789, KJ704790, KJ704791, 
KJ704792, KJ704793, KJ704794, KJ704795, KJ704796 and KJ704797. 
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Extended Data Figure 1 | See next page for caption. 
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Extended Data Figure 1 | Jackhammering schematic and primer panels 
and pools. a—d, Detection and amplification of target RNA molecules 

in old, degraded, low-titre samples. For the purposes of illustration, 
consider a tube with 10! RNA molecules, but (because of the low RNA 
quality) only one molecule that is (i) capable of being primed by the 
given reverse primer(s) and (ii) long enough to form a 200-bp product. 
a, Conventional RT-PCR with a long amplification product, oversized 
for a sample with RNA less than ~200 bases in length. b, RT-PCR with a 
shorter amplification product. c, Use of multiple primer pairs to increase 
the chance of at least one PCR-positive result. d, The jackhammering 
approach, which overcomes the problems encountered in a—c by 

(i) targeting an extensive panel of short amplicons appropriately sized 
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to the level of RNA survival in the sample, (ii) conducting reverse 
transcription with pools of primer pairs that amplify discrete, 
non-overlapping genomic regions, and (iii) employing a multiplex 
pre-amplification step, in the tube with the reverse transcription product, 
to generate sufficient DNA to ensure that each aliquot from it contains 
numerous template molecules for final PCR amplification. In this 
schematic, we show just two primer pairs per pool, but we used pools of 
ten pairs with our largest primer panels (shown in e, HXB2 numbering 
along HIV-1 genome). With a 10 primer-pair pool, and 10 final reactions, 
one can reliably recover 10 bands for sequencing. Five such pools 

(one entire panel of 50 pairs), allows complete HIV-1 genome recovery 
even in heavily degraded samples. 
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Extended Data Figure 2 | Maximum clade credibility (MCC) tree CB, Caribbean; US, the United States; CA, California, GA, Georgia; NY, 
summaries of Bayesian spatio-temporal reconstructions based on New York. The diameters of the internal node circles reflect posterior 
complete HIV-1 genome data. a, ‘full genome 46; b, ‘full genome 38. The _ location probability values. Thick outer circles represent internal nodes 
tips of the trees correspond to the year of sampling while the branch with posterior probability support >0.95. Grey bars indicate the 95% 
(and node) colours reflect location: the sampling location for the tip credibility intervals for the internal node ages. The tree in b represents the 
branches and the inferred location for the internal branches. AF, Africa; fully annotated version of the tree in Fig. 1 in the main text. 
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Extended Data Figure 3 | Maximum clade credibility (MCC) tree 
summaries of Bayesian spatio-temporal reconstructions based on 
different genome region data sets. MCC trees for the same strains are 
shown for a, gag, b, pol, c, env and d, the complete genome. The tips of 
the trees correspond to the year of sampling while the branch (and node) 
colours reflect location: the sampling location for the tip branches and 
the inferred location for the internal branches. AF, Africa; CB, Caribbean; 
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US, the United States. Tip labels are provided for the newly obtained 
archival HIV-1 genomes. The diameters of the internal node circles reflect 
posterior location probability values. Thick outer circles represent internal 
nodes with posterior probability support >0.95. We also depict the 
posterior probability densities for the time of the introduction event from 
the Caribbean into the US on the time scale of the trees. 
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Extended Data Figure 4 | Maximum likelihood phylogenies for the different genome region data sets. a, gag, b, pol, c, env and d, the complete 


genome. We analysed the same data sets as in Extended Data Fig. 3. The diameters of the internal node circles reflect bootstrap support values. We 
manually coloured the branches in a similar way as for the Bayesian phylogeographic reconstructions. 
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Extended Data Figure 5 | Maximum clade credibility (MCC) tree 
summaries of Bayesian spatio-temporal reconstructions based on 
different env data sets. a, ‘env 105’ b, ‘env 74. The tips of the trees 
correspond to the year of sampling while the branch (and node) colours 
reflect location: the sampling location for the tip branches and the 
inferred location for the internal branches. AF, Africa; CB, Caribbean; 
US, the United States, CA, California; GA, Georgia; NJ, New Jersey, 
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depict the posterior probability density for the time of the introduction 
event from the Caribbean into the U.S on the time scales of the trees. The 
three partial env sequences from SF in 1978 (ref. 10) are highlighted with 
bullets. 
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Extended Data Figure 6 | Maximum clade credibility (MCC) tree 
summaries of Bayesian spatio-temporal reconstruction comparing 
early and late strains. a, ‘env 133; b, only ‘late’ sequences from ‘env 133. 
In a, we classified US sequences as ‘early’ or ‘late’ depending on 

whether they were sampled before or after (and including) 1985. In 

b, the analysis was conducted on an empirical tree distribution of ‘env 133’ 
from which we pruned early US sequences (in grey), but we still annotate 
the reconstruction on the complete phylogenies for reference. The tips of 


Time 
the tree correspond to the year of sampling while the branch (and node) 
colours reflect location: the sampling location for the tip branches and the 
inferred location for the internal branches. AF, Africa; CB, Caribbean; US 
early, the United States sampled <1985; US late, the United States sampled 
in or after 1985; CA, California; GA, Georgia; NC, North Carolina, NY, 
New York. The diameters of the internal node circles reflect posterior 
location probability values. Thick outer circles represent internal nodes 
with posterior probability support >0.95. 
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Extended Data Figure 7 | A cluster of 40 early AIDS patients linked through sexual contact. Reprinted from figure 1 of ref. 5 with permission from 


Elsevier. 
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regressions based on the maximum likelihood trees (Extended Data 
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used for sampling locations in the phylogenies. The US data points with 
black circles represent the new genomes dating back to 1978-1979. The 
data point with the target symbol represents the Patient 0 genome. In each 
plot, we provide the R* for the regression and the slope, reflecting the 
evolutionary rate (in substitutions per site per year). 
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Extended Data Table 1 | Molecular clock, phylogeographic and recombination estimates for the different data sets 


Data set TMRCA TMRCA Location Jump time TMRCA Location Evolutionary Rate, Phi 
(subtype B (subtype B) probability (CB to US) (US probability rate coefficient test 
& D) (subtype B) subtype B) (US of variation p-value 


a 


“full genome 1967 CB: 0.99 1971 1972 NY: > 0.99 0.0024 0.26 
38”, Fig.1&  (1946,1962)  (1963,1970) (1968.1973)  (1970,1974) (0.0021,0.0027) — (0.20,0.32) 
ED Fig. 2 


“pol”, ED CB: 0.92 1973 US: > 0.99 0.0015 


Fig. 3 


“env, 1952 1968 CB: 0.99 1970 1971 US: 0.99 0.0039 0.59 
recomb. (1940,1961)  (1964,1972) (1966,1973)  (1967,1973) (0.0031,0.0047)  (0.18,0.35) 
rine 


“env 105, 1955 1968 CB: > 0.99 1970 1971 US: > 0.99 0.0047 x 
recomb. (1947,1961)  (1974,1970) (1968.1972)  (1969,1972) (0.0041 ,0.0053) (0.18,0.28) 
free”* 


“env 74, CB: 0.99 :0. 
recomb. (1948,1964)  (1964,1972) (1968,1973) (1970,1974) (0.0038,0.0054) (0.23,0.39) 


334 


free 


*The recombination free (‘recomb. free’) data sets were obtained by deleting the minor recombinant regions from the putative recombinants identified using RDP4. 
+The empirical trees from the ‘env 133’ analysis were used for two different ancestral reconstructions (Extended Data Fig. 6); here we list the location estimates for the analysis that considered 
different US states for the late samples (Extended Data Fig. 6b). 
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Single-cell RNA-seq identifies a PD-1" ILC 
progenitor and defines its development pathway 


Yong Yu"*, Jason C. H. Tsang!?**, Cui Wang!**, Simon Clare!, Juexuan Wang!, Xi Chen!, Cordelia Brandt!, Leanne Kanel, 
Lia S. Campos!, Liming Lu’, Gabrielle T. Belz®’, Andrew N. J. McKenzie®, Sarah A. Teichmann!’, Gordon Dougan!" & Pentao Liu! 


Innate lymphoid cells (ILCs) functionally resemble T lymphocytes 
in cytotoxicity and cytokine production but lack antigen-specific 
receptors, and they are important regulators of immune responses 
and tissue homeostasis!”. ILCs are generated from common 
lymphoid progenitors, which are subsequently committed to 
innate lymphoid lineages in the «-lymphoid progenitor, early 
innate lymphoid progenitor, common helper innate lymphoid 
progenitor and innate lymphoid cell progenitor compartments? *. 
ILCs consist of conventional natural killer cells and helper-like cells 
(ILC1, ILC2 and ILC3)°. Despite recent advances)", the cellular 
heterogeneity, developmental trajectory and signalling dependence 
of ILC progenitors are not fully understood. Here, using single-cell 
RNA-sequencing (scRNA-seq) of mouse bone marrow progenitors, 
we reveal ILC precursor subsets, delineate distinct ILC development 
stages and pathways, and report that high expression of programmed 
death 1 (PD-1"') marked a committed ILC progenitor that was 
essentially identical to an innate lymphoid cell progenitor. Our data 
defined PD-1"™TL-25R"' as an early checkpoint in ILC2 development, 
which was abolished by deficiency in the zinc-finger protein Bcl11b 
but restored by IL-25R overexpression. Similar to T lymphocytes, 
PD-1 was upregulated on activated ILCs. Administration of a PD-1 
antibody depleted PD-1"' ILCs and reduced cytokine levels in an 
influenza infection model in mice, and blocked papain-induced 
acute lung inflammation. These results provide a perspective for 
exploring PD-1 and its ligand (PD-L1) in immunotherapy, and allow 
effective manipulation of the immune system for disease prevention 
and therapy. 

We purified individual Lin” Flt3!°/IL-7Ra!’+ 0437+ bone marrow 
cells®®!° for sCRNA-seq analysis (4 x 96 wild type and 2 x 96 Bel11b- 
deficient) (Extended Data Fig. 1a). After sequencing quality control 
(Extended Data Fig. 1b-j), 325 wild-type cells (172 CD244+CD25-, 
84 CD244~CD25~ and 69 CD244~ CD25"), and 172 Bell 1b-deficient 
cells (Lin™ Flt3- IL-7Ra*0487*) were further analysed. Comparison of 
External RNA Controls Consortium (ERCC) RNA spike-ins in libraries 
did not reveal significant differences between experiments, indicating 
minimal batch bias (Extended Data Fig. 1g). 

In total, 8,758 genes showed significant biological expression 
variability'! (Extended Data Fig. 1h) and were used for dimensional 
reduction analysis by t-distributed stochastic neighbour embedding 
(t-SNE). Distinct aggregation pattern of ILCs was observed in the wild- 
type dataset (Fig. 1a), where cells were roughly grouped into 10 clusters, 
probably representing bone marrow cells of distinct lineages and/ 
or developmental stages. Two major distinct subpopulations were 
immediately evident on the basis of expression of ILC regulators Id2, 


Gata3 and II7r; 1d2!°Gata3!"IL-7Ra!® (clusters 1, 2 and 3, denoted as 
C1-C3) and Id2"Gata3"IL-7Ral™ (C6-C10) (Fig. 1b-d). Expressions of 
megakaryo-erythrocytic and myeloid regulators in the Id2'°Gata3"°IL-7Ral° 
(C1-C3) suggested that they are non-ILCs (Fig. 1d, Extended Data 
Figs 2, 3). Bcll1a is expressed in common lymphoid progenitors, but 
not in T cells or conventional NK cells (CNKs)!*. Bcl11a decreased in 
C4 and was very low in other clusters (Fig. 1b). 

Many C4 and C5 cells expressed I17r or Gata3 (Fig. 1b-d). C4 
had high Notch1 and detectable Fit3, and a small number of C4 cells 
expressed Nfil3 (Fig. 1d), which is important in « lymphoid progeni- 
tors (aLPs) development”!*"4, C4 cells expressed no Tcf7 (that encodes 
TCF-1), Rorc (ROR t), Cxcr5, Id2 or Zbtb16 (PLZF) (Fig. 1c, d), and 
therefore are probably «LP1 progenitors*’. By contrary, C5 had high 
Tcf7, Gata3, Tox, Tox2 (ref. 15) and Ets1 (ref. 16) expression, but did 
not express Bell 1a or Fit3 (Fig. 1b-d). C5 cells were thus reminiscent 
of early innate lymphoid progenitors (Lin“ TCF-1*Thy-1~IL-7Ral-)§, 
which have potential to all ILC lineages. Notch signalling induces TCF-1 
expression in T-cell and ILC development”. Concordantly, Notch1 
peaked in C4 before Tcf7 and Hes1 induction in C5 (Extended Data 
Fig. 2). We only observed a small number of C5 cells in the wild-type 
dataset, possibly owing to our preferential sorting for IL-7Ral”* cells. 

Common helper innate lymphoid progenitors (CHILPs) are defined 
as Lin Id2*IL-7Ra*a437+CD25~ (ref. 5). Almost all C6—C10 cells 
highly expressed Id2 (Fig. lb-d). C6—C9 cells expressed II7r, Itga4 and 
Itgb7 (encoding integrin 487), but not I/2ra (CD25), and probably rep- 
resented the CHILP compartment. Specifically, C6 cells expressed high 
Zbtb16 and II7r, Kit, Itga4 and Itgb7, but low Myc (Fig. 1c, Extended 
Data Fig. 2), which is consistent with innate lymphoid cell progenitor 
(ILCP) phenotype (Lin -IL-7Ra*c-Kit*o487PLZF") (ref. 4) and the 
finding that Myc is suppressed by PLZF'®. 

C7a cells highly expressed many ILC2 genes such as Bcl11b, Icos, Rora 
and Gata3, and notably, C8-C10 cells gradually increased in expression 
of ILC2 genes including /2ra, I1rl1 (IL-33R), Bmp7 and Pparg (Fig. 1d, 
Extended Data Fig. 2). Therefore, ILC2 development probably follows 
the trajectory from C7a to C10. Through scRNA-seq, we discovered 
potential new ILC2 genes that were expressed primarily in C8-C10 
cells: for example, Itgb3, Pbxip1, 1700113HO08Rik (in C8 cells); Ccr8, 
Gelc (in C9 cells); and Cish (in C10 cells) (Extended Data Fig. 3). In con- 
trast to C7a cells, C7b cells expressed genes enriched in ILC1s and/or 
cNKs, for example, Tbx21 (T-bet), I12rb (CD122), Ner1 (NKp46), Cxcr3 
and Ctsw (Extended Data Figs 2, 3). RORyt* ILC3 precursors are rare 
in the adult mouse bone marrow*”, whereas ILC2 precursors are well- 
represented!?. A small number of C7c cells exhibited features of ILC3 cells: 
high expression of Rorc, Ahr, Ccr7 and Cxcr5 (Extended Data Figs 2, 3). 


1Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1HH, UK. 2Department of Chemical Pathology, The Chinese University of Hong Kong, Prince of Wales Hospital, Shatin, New Territories, 
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Veterinary Science, Shanghai Academy of Agricultural Sciences, Shanghai 201106, China. °Shanghai Institute of Immunology, Shanghai Jiaotong University School of Medicine, Shanghai 200025, 
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Figure 1 | sCRNA-seq analysis of innate lymphoid progenitors. a, Biaxial 
t-SNE clustering plot of 325 wild-type Lin Flt3!"/-IL-7Rat!°a487* bone 
marrow cells. Each point represents an individual cell. Cells were grouped 
into 10 distinct clusters (C1-C10, colours indicated). b, Distribution of 
gene expression in t-SNE plot. The colour key indicates the expression 
level. c, Violin plots comparing expression of key ILC regulators in 
different clusters. The y axis indicates the logs (normalized count + 1) 
expression levels. The black point indicates the mean of expression level. 
d, Heat map of the expression levels of selected genes in different clusters. 


Notably, we observed distinct T-cell-receptor transcript expression 
demarcation between C1-C4 (non-ILCs and uncommitted progeni- 
tors) and C5-C10 (ILC progenitors) (Extended Data Fig. 4). 

ILC progenitor isolation currently necessitates the use of multiple 
cell surface markers and genetic reporters (Id2, Tcf7 or Zbtb16)**. 
We sought to find novel ILC markers that can remove this obstacle. 
Closer examination of C6 revealed high levels of Pdcd1 (Fig. 2a), which 
encodes PD-1, a cell surface receptor and a member of the immuno- 
globulin super family. Notably, Pdcd1 and Zbtb16 were positively cor- 
related in C6 cells (Extended Data Fig. 5a). Pdcd1-expressing C6 cells 
also expressed Id2, Tcf7, Tox and Gata3, but lacked FIt3 or markers of 
mature ILCs (Extended Data Fig. 5b). Unlike activated T cells, C6 cells 
did not express other inhibitory or activation molecules", or PD-1 
ligand genes Pdcd Ilg1 (also known as Cd274) or Pdcd1lg2 (Extended 
Data Fig. 5b), suggesting that high Pdcd1 expression was unique to ILC 
progenitors in the Lin” compartment. 

We then isolated Lin~ PD- 1" cells (PD-1" hereafter) and confirmed 
strong expression of multi-lineage ILC regulators: PLZE, TCF-1, TOX 
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and GATA3 (refs 4, 6, 15, 21) (Fig. 2b). They were positive for ILC pro- 
genitor markers but not FIt3 or lineage-committed markers (Extended 
Data Fig. 5c). PD-1" cells showed in vivo developmental potential in 
adoptive transfer experiments in alymphoid Rag2~/Il2rg-!~ mice. In 
addition, engrafted control common lymphoid progenitors produced B 
cells, T cells and all ILCs (CNK: NK1.1*CD49a~CD49b*, ILC1s, ILC2s 
and ILC3s, including lymphoid-tissue-inducer cells (LTi)), whereas 
PD-1"i cells generated liver NK1.1*CD49atCD49b~ ILCI cells, ILC2 
and ILC3 cells, and a small number of cNKs, but no B, T or CCR6* 
LTi cells (Extended Data Fig. 5d). Within the experimental time win- 
dow, common lymphoid progenitors produced PD-1' cells in the bone 
marrow, but all of the donor PD-1"' cells lost PD-1 expression in the 
recipients, indicating their differentiation (Extended Data Fig. 5d). 

To determine the precursor-descendant relationship, we seeded indi- 
vidual PD-1' bone marrow cells on stromal cells and assayed their 
progenies by flow cytometry. Individual PD-15' cells generated 
mixtures of ILC lineages (double (two) or triple (three) lineages of 
ILC1s, ILC2s and ILC3s) in 22 out of 98 wells (22.4%) (Extended Data 
Fig. 5e), demonstrating that they were enriched in both unipotent and 
multipotent ILC progenitors. 

We next directly compared PD-1" cells to PLZE" cells (ILCP) 
or CHILPs. In the Zbtb16°F? reporter mouse, in which Zbtb16 
(PLZF) expression is tracked by GFP‘, the majority of PD-1" cells 
also expressed high PLZF as detected by flow cytometry (Fig. 2c). 
Co-transplantation confirmed their similar in vivo developmental 
potential (Extended Data Fig. 6a). Therefore, innate lymphoid cell pro- 
genitors could now be isolated by simple PD-1 staining, without relying 
on a transgenic reporter. In the Id2°'? reporter mouse®, PD-1' cells 
were all CHILPs (Fig. 2d), whereas about 50% of CHILPs were Pp-1 
cells (Fig. 2e). Many PD-1'°~ cells in CHILP increasingly expressed 
lineage markers and downregulated c-Kit (Fig. 2f), indicating potential 
lineage commitment. 

To estimate the heterogeneity in ILC progenitor compartments, we 
crossed 1d2°'? mice with Bell 1b'4!™#"° conditional knockout mice 
(Extended Data Fig. 6b). Flow cytometry analyses detected Bell 1b 
cells in about 50% of CHILPs and 40% of PD-1"% cells, respectively 
(Extended data Fig. 6c, d). PD- 1Bcl11b+ bone marrow cells produced 
ILC2 only, whereas in clonal assays PD-1"Bcl11b~ cells generated mix- 
tures of ILCs of double and triple lineages in as many as 50% of wells 
and showed ILC1-3 potential in vivo (Extended Data Figs 5e, 6e). 

To study the PD-1™ compartment further, we performed scRNA-seq 
analysis of a further 184 PD-1" cells. Clustering analysis showed polar- 
ized expression of I/17rb (46 out of 184, 25%) (Extended Data Fig. 7a). 
The differential I/17rb expression and spatial proximity in t-SNE were 
largely consistent with that from hierarchical clustering (Extended Data 
Fig. 7b), revealing that Pdcd1 expression was associated with Zbtb16, 
Tcf7, Tox and Tox2, and that those cells with low levels of Pdcd1 or Tox 
expressed higher I/17rb, indicating that they were primed or committed 
to an ILC lineage. 

In the initial scRNA-seq dataset, prominent expression of I/17rb 
was first detected in C7a and reached the highest levels in C8 and C9 
(Fig. 3a). Gene set enrichment analysis revealed that from C7a to C8, 
ILC2 signature was increased, whereas ILC] and cNK signature was 
downregulated (Fig. 3b), and that I]17rb and Bcl11b were among those 
showing spike or peak expression at C7a to C8 (Extended Data Fig. 8a). 

Experimentally, about 20-30% of the PD- 1" cells were IL-25Rt, of 
which all were Bcl11b*, and clonal differentiation and adoptive trans- 
fer experiments confirmed that IL-25R* cells were ILC2-restricted 
(Extended Data Fig. 8b-d). These data revealed that IL-25R marked 
early ILC2 progenitors in PD-1" cells, and that Bcl1 1b is essential 
for ILC2 development”*”*. To determine the exact stage that ILC2 
development becomes defective in the absence of Bcl11b and inves- 
tigate the relationship between Bcl11b and Il17rb, we generated the 
Vav-Cre-Bcl11b/" mouse where Bcl11b was specifically deleted in 
hematopoietic cells and found that neither PD- 1TL-25R+ (Fig. 3c, d) 
nor peripheral ILC2s (Extended Data Fig. 8e) were present. sCRNA-seq 
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a Figure 2 | PD-1 high expression marks innate 
I ar lymphoid progenitors. a, Distribution of 
i—7 Pdcd1 and Zbtb16 expression in t-SNE plots. 
z ! b, Expression of key ILC regulators in PD-1™ 
3] A cells. Protein expression was measured by 
38 cli intracellular antibody staining. c, Comparison 
_ Zbtbi6 sea ne = of PD-1™ and PLZF™ cells in the Zbtb16@'P 
= — oes reporter mice (n = 4). d, FACS plots show all 
b pilepse nal | a PD-1" cells were CHILPs using the 142°? mice 
{ 7 (n=6). e, PD-1"i cells are a subset of the CHILP 
j 97.0 j 98.0 e 3 compartment (n= 6). f, Detection of ILC 
| { 3 lineage markers on PD-1'°/~ or PD-1"' cells in 
2 a PD-1 ——»> CHILPs. Data are representative of two (b, c) or 
a L e) on three (d-f) independent experiments. 
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of Bcl11b-deficient bone marrow cells (Lin~ Flt3-IL-7Ra* 0437") fur- 
ther revealed the lack of ILC2 progenitors corresponding to those 
in C8-C10 of the wild-type bone marrow (Fig. 3e). Intriguingly, the 
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mutant C6 cells still expressed high levels of key ILC regulators Gata3, 
Tcf7, Tox and Id2 (Extended Data Fig. 9a, b), and retained ILC2 signa- 
ture similar to wild-type cells (Fig. 3f). Therefore, Bcl1 1b-deficiency 


Figure 3 | sCRNA-seq dissection of early ILC2 development. 

a, Distribution of I/17rb in t-SNE biaxial plot and violin plot in C6-C10 
clusters. b, Gene set enrichment analysis of expression changes. Stepwise 
comparisons between clusters of each transition step were performed 
and tested for enrichment of ILC1/cNK, ILC2 and ILC3/LTi signatures. 
The y axis indicates the normalized enrichment score, the x axis indicates 
the transition steps tested. Positive and negative enrichments indicate a 
relative enrichment or depletion in ILC signature genes at that transition, 
respectively. c, FACS analysis of PD-1"IL-25R° cells in the control or 
Vav-Cre-Bcl1 1b" mice (n =3 per genotype). d, PD-1™ and PD-1"IL- 
25R* cell numbers in one femur of a control or mutant mouse (n =6). 
**P < 0.01 (two-tailed t-test). e, t-SNE plot of sequenced wild-type and 
Vav-Cre-Bcl1 1b" bone marrow cells. C8-C10 cells were missing in the 
mutant, indicating a developmental blockade. f, Gene set enrichment 
analysis comparing the ILC2 signature of C6 cells of wild-type and 
mutant bone marrow cells. g, Overexpression of I]17rb rescued ILC2 
development of Bcl1 1b-deficient PD-1"' cells in vitro. The infected cells 
were cultured on OP9-DLI stromal cells and analysed two weeks later. 

h, The proposed ILC2 lineage development pathway. The red arrows 
indicate the developmental transition from one cluster to another. Data are 
representatives of three (c, d) or two (g) independent experiments. 
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Figure 4 | PD-1"i marks effector ILCs. a, PD-1"' cells in the influenza 
A/X-31-infected (day 5) RagI~/~ mice (n=3). b, c, PD-1™ ILC2s and 
IL-13" ILC2s in infected mice (n =3 per time point). d, Only PD-1" cells 
in infected mice produced high levels of IL-13. IL-13 was detected in the 
1113+ /'4Tmato mice (n= 6). e-j, Depletion PD-1™ ILCs by J43 antibody in 
infected mice. An isotype control antibody was used as a negative control. 
Lung tissue was collected at the indicated time points (n =5 per treated 
group). e, f, Lung ILC2s were quantified as shown (7 days after infection). 
ILC2s: Lin” CD45*IL-7RatBcl11b*. g, h, Percentage of lung PD-1"ILC2s 
as shown (7 days after infection). i, Percentage of lung IL-13" ILC2s 
(7 days after infection). j, Type-2-related cytokines in BALF of the mice 
of different genotypes or treatments (n= 3 per group per time point): 
Rag1~/~ mice with mock infection; Rag1~’~ mice infected with A/X-31 
and treated with either an isotype control or J43. Rag2~/~Il2rg~/~ mice 


blocked ILC2 development at the C7a to C8 transition (Fig. 3e), where 
expression of Bcl11b and I/17rb peaked (Extended Data Fig. 8a). These 
observations led us to explore the possibility that overexpressing [117rb 
may restore Bcl11b-deficient ILC2 development. We thus expressed 
transgenic II17rb in Bell 1b-deficient PD-1" cells. The transgenic mutant 
cells had typical ILC2 markers (Extended Data Fig. 9c) and produced 
IL-5 (Fig. 3g). Therefore, in the ILC2 developmental trajectory (Fig. 3h), 
Bcl11b functions through IL-25R signalling at the early commitment 
checkpoint (C7a to C8). Upon IL-25 administration, the Vav-Cre- 
Bcl11b" mice produced neither ‘natural ILC2 nor ‘inflammatory’ 
ILC2 cells** and did not show increased IL-5 levels (Extended Data 
Fig. 8f). On the other hand, NK1.1*NKp46* ILCs were increased in 
the Bcl1 1b-deficient bone marrow and small intestinal lamina propria 
(siLP) (Extended Data Fig. 8e). In adoptive transfer experiments, the 
mutant PD-1" ILC progenitors developed into ILC1s and ILC3s, but 
not ILC2s (Extended Data Fig. 8g). 

PD-1 is an inhibitory receptor on activated T cells and a checkpoint 
molecule. It is being exploited for re-activating cytotoxic T cells in 
cancer immunotherapy”®. We further expanded our scope to study 
PD-1 expression in tissue-resident ILCs. At steady-state, PD-1* cells 
accounted for 20-40% ILC2s, 20-30% ILC3s and approximately 76% 
siLP LTi cells, but fewer cNK or ILC1 cells (Extended Data Fig. 10a, b). 
By contrast, influenza infection substantially increased the number 
of lung PD-1! ILC2s, and to a lesser extent PD-15' CNK and ILC1 
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infected with A/X-31 as the ILC-deficient control. k-n, Papain challenge 
of Rag1~/~ mice. The mice were pretreated with J43 or the isotype control 
for 3 days and administrated with papain (intranasally) for 5 consecutive 
days. The lung tissues and BALF were collected at day 6 for analysis 

(n=5 per treatment). k, 1, Eosinophils (SiglecF*CD11c_) in the BALF 
were analysed as shown. m, Haematoxylin and eosin staining of lung 
sections. Inflammation with groups of eosinophils was observed in the 
isotype control lung. J43 treatment shows a reduction in the number of 
eosinophils. n, Periodic-acid—Schiff staining of lung sections. Positively 
stained goblet cells were found in the isotype control lung but not in 

the J43-treated lung in an equivalent section of respiratory epithelium. 
Scale bar (m, n), 501m. Error bars (b, ¢, f, h, i, j, 1) denote s.e.m. Data are 
representatives of two (a-n) independent experiments. *P < 0.05, 

** P< 0.01 (two-tailed t-test). 


cells (Fig. 4a). PD-1" ILC2s were abundant on day 3 and day 7 after 
infection, but much fewer at 14 or 21 days after infection when mice 
recovered from the infection (Fig. 4b). Therefore, PD-1 may be an indi- 
cator of ILC activation, similar to that in T and B cells?®. Indeed, in 
11 3+/4 Tomato reporter mice”, all IL- 13" ILC2s were PD-1" (Fig. 4c, d). 

To explore the functions of PD- 15' ILCs further, we injected a PD-1 
antibody (denoted as J43), which can reduce mouse CD4+PD-1* 
T-cell numbers by complement-dependent cytotoxicity”*. Repeated 
administration of J43 reduced PD-1" T cells and tissue-resident ILCs 
in adult mice (detected by an alternative PD-1 antibody RMP1-30) 
(Extended Data Fig. 10c). The failure to detect PD-1 was not due to 
epitope competition, as J43 and RMP1-30 effectively co-stained lung 
ILC2s (Extended Data Fig. 10d). We next pre-conditioned Rag] /~ 
mice with J43 before influenza infection. 7 days after infection, J43 
injection caused reduction of total lung ILC2 (Bcl11b*) (Fig. 4e, f), 
with almost complete loss of PD-1'! ILC2 cells (Fig. 4g, h) and IL-13- 
producing ILC2s (Fig. 41). Moreover, type 2 cytokines were sparsely 
detected in bronchoaveolar lavage fluid (BALF) of J43-treated mice 
(Fig. 4j). Similarly, PD-15 ILC1s, cNKs and ILC3s were also depleted 
with reduction of type 1 and 3 cytokines (Extended Data Fig. 10e, f). 
Anti-PD-1 treatment may thus provide an effective approach to mod- 
ulate ILC responses. 

The protease allergen papain causes occupational asthma and air- 
way inflammation by activating potent ILC2 cytokine response”. 
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We found that papain challenge increased lung PD-1™ ILC2 cells in 
the Ragl~/~ mice (Extended Data Fig. 10g). Unlike reduction by ILC2 
depletion with CD25 antibody”, J43 treatment specifically depleted 
PD-1" and IL-5-producing ILC2 cells (Extended Data Fig. 10h), 
and resulted in less eosinophil accumulation in BALF (Fig. 4k, 1). 
Histological examination of lung sections revealed that J43 reduced 
lung eosinophil cells (Fig. 4m). Periodic-acid—Schiff staining demon- 
strated mucus production inhibition (Fig. 4n). Depleting PD-1 ILCs 
thus effectively prevented lung inflammation caused by papain. 
scRNA-seq of bone marrow ILCs thus enabled identification of 
distinct ILC progenitor subsets, delineation of ILC development, and 
discovery of PD-1 as a marker of innate lymphoid progenitors and 
effector ILCs. These results present new therapeutic opportunities for 
manipulation of ILCs to achieve optimal immune responses. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 

Mice. The Bcl1 1b!" conditional knockout reporter mice were gene- 
rated on C57BL/6 background (Extended Data Fig. 6b). Vav-Cre (018968) and 
Zbtb16¢'?"* (024529) mice were purchased from The Jackson Laboratory. 1d2°'? 
mice were from a C57BL/6 background?!. [I-13*/“4""° mice were from a Balb/c 
background?’. Rag2~/~Il2rg~/~ mice (C57BL/6 CD45.1) were used as recipients 
for adoptive transfers*’. Mice aged 6-12 weeks were used for all experiments. Both 
sexes were included without randomization. No statistical method was used to 
predetermine sample size. The number of mice used in each experiment to reach 
statistical significance was determined on basis of preliminary data. Littermate 
controls were used whenever possible. All mice used were maintained at RSF of 
the Sanger Institute. Housing and breeding of mice and experimental procedures 
were carried out according to the UK 1986 Animals Scientific Procedure Act and 
Animal Welfare and Ethical Review Body (AWERB) of the Wellcome Trust Sanger 
Institute. The experiments were not randomized and the investigators were not 
blinded to allocation during experiments and outcome assessment. 

Reagents. Fluorochrome- or biotin-labelled monoclonal antibodies (clones 
denoted in parenthesis) against B220 (RA3-6B2), CD19 (6D5), CD3e (145-2C11), 
CD5(53-7.3), CD80 (53-6.7), TCRB (B20.6), TCR46 (GL3), NK1.1 (PK136), Nkp46 
(29A1.4), CD49a (HMal), CD49b (DX5), CD11b (M1/70), CD11c (N418), Grl 
(RB6-8C5), Ter119 (TER-119), c-kit (2B8), Flt3 (A2F10), Scal (D7), IL-7Ra 
(SB/199), CD25 (PC61), CD45.1 (A20), CD45.2 (104), CD45 (30-F11), CD244 
(2B4), a487 (DATK32), IL-33R (RWST2-2), CD122 (TM-b1), CD94 (18D3), 
CD226 (10E5), CXKCR3 (CXCR3-173), CXCR5 (2G8), CCR9 (CW-1.2), CCR6 
(29-2117), PD-1 (29F.1A12, J43 or RMP1-30), Gata3 (L50-823), ROR+t (Q31- 
378), PLZF(R17-809) and IL-5 (TRFK5) were purchased from BD Biosciences, 
Biolegend or eBioscience. TCF-1 (C63D9) and TOX (REA473) were purchased 
from Cell Signaling Technology and Miltenyi Biotec, respectively. LEGENDplex 
Mouse Th Panel (Biolegend, 740005) was used for BALF cytokines measurement. 
The PD-1 antibody (J43, BE0033-2) and isotype control antibody (Armenian 
Hamster IgG, BE0091) were purchased from Bio X cell for in vivo administration. 
Flow cytometry and cell sorting. Red blood cells were removed using ACK lysis 
buffer (Lonza, 10-548E). Cells were suspended in a solution of 2% (vol/vol) FBS in 
PBS. Fc receptors were blocked with anti-CD 16 (2.4G2) before antibody labelling. 
Cells were stained with antibodies on ice for 20 min before washing. Intracellular 
staining was performed according to the instruction of the FOXP3 Fix/Perm Buffer 
Set (Biolegend, 421403). Cells were analysed on the LSRFortessa cell analyser (BD) 
or sorted on the MoFlo XDP cell sorter (Beckmam Coulter) according to the manu- 
facturers’ standard operating procedures, respectively. Data were analysed with 
FlowJo version 10.0.7 software (Tree Star). 

Preparation of cell suspensions. Bone marrow cells were isolated by gently crush- 
ing femurs and/or tibias before filtration (70-1m filter). Cells from lung and small 
intestine lamina propria were prepared according to the instructions of Lung 
Dissociation Kit (Miltenyi Biotec, 130-095-927) and Lamina Propria Dissociation 
Kit (Miltenyi Biotec, 130-097-410), respectively. The tissues were digested in a 
shaking water bath at 37°C for 15-20 min. 

Adoptive transfers in vivo. Highly purified cell populations by flow cytom- 
etry were injected intravenously into sublethally irradiated (1 x 450 rads) 
Rag2/Il2rg‘~ recipient mice (CD45.1*) via the tail vein. The drinking water 
was supplemented with antibiotics for 2 weeks after irradiation. 

In vitro culture assay. PD-1'', PD-1"Bcl11b*, PD-1™Bcl11b~, PD-1"IL-25R* 
and PD-1"IL-25R™ cells were individually sorted into 96-well plates containing 
Mitomycin C (101g ml~!)-treated OP9 stomal cells (70% confluent). Cells were 
cultured in RMPI160 medium (10% FBS and 801M 2-Mercaptoethanol) supple- 
mented with 20ng ml“! IL-7 (PeproTech) and 50ng ml~! mSCF (PeproTech). 
All of the cells were analysed on days 10-12 of culture. For [117rb overexpression 
rescue experiment, transduced Bcl1 1b-deficient PD-1" cells were co-cultured with 
ILC2P (CD45.1) in the presence of 20ng ml’ IL-7, 20ng ml’ SCF and 20ng mI! 
IL-33 for 14 days. 

Retroviral transduction of ILC precursors. The Phoenix retroviral packaging 
system was used with the transfection reagent FuGENE 6 (Promega). Retroviral 
supernatants were centrifuged (2,000g) at 32°C for 90 min twice on 50,.g ml“! 
RetroNectin (TaKaRa) precoated 24-well plates according to the manufacturer's 
instruction. PD-1™ILC progenitors were transduced by centrifuge at 500g for 
30 min. Mouse I117rb cDNA was bought from Origene (MC205843) and was 
cloned into pMxXs retroviral vector. The empty retroviral vector carrying GFP 
only was used as the control. 

Cell lines. OP9 and OP9-DLI cells were kindly provided by J. C. Zuifiga-Pfliicker 
(University of Toronto). Cells were maintained at 37°C in a humidified 5% CO, 
atmosphere. 

Influenza infection. The mice were anaesthetized with 3% isoflurane and were 
inoculated intranasally with influenza A virus X31 (H3N2) in 5011 PBS of 104 
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plaque-forming units. The virus was grown and collected from embryonated 
chicken eggs (48-72h). At day 5 after infection, lung and BALF (lavaged with 
0.4 ml PBS) were collected. 

Papain administration. The mice were anaesthetized with 3% isoflurane and 
then were intranasally administered with 30 j1g of papain (Acros Organics) in 301 
1 PBS every day for 5 days. Lungs and BALF were collected and analysed at day 6. 
Lung histologic sections. The left lobes of the lungs were fixed with 4% para- 
formaldehyde and embedded in paraffin. 51m sections were used for staining 
with haematoxylin and eosin or periodic-acid-Schiff stain. Image acquired with a 
Leica DM4000B LED microscope, an Olympus DP72 camera, and cellSens version 
1.4 imaging software. Only the brightness and contrast of the whole image were 
adjusted by the Adobe photoshop (CS5, Adobe systems Inc.). 

Generation of single-cell RNA-seq library. The single-cell mRNA-seq library 
was generated following the SMART-seq2 protocol described in ref. 33. In short, 
single bone marrow progenitors were sorted into 96-well plates pre-filled with 
lysis buffer and external RNA spike-ins (Ambion) (1:500,000). First-strand syn- 
thesis and template-switching were then performed, followed by 25-cycle of 
pre-amplification. Complementary DNAs were purified by AMPure XP magnetic 
beads (Agencourt) using an automated robotic workstation (Zephyr). Quality of 
cDNAs was checked with the Bioanalyzer (Agilent) using high sensitivity DNA 
chip. Multiplex (96-plex) libraries were constructed and amplified using Nextera 
XT library preparation kit (Illumina). The libraries were then pooled and purified 
with AMPure XP magnetic beads. The quality of the library is then assessed by 
the Bioanalyzer (Agilent) before submission to the DNA sequencing pipeline at 
the Wellcome Trust Sanger Institute. Pair-ended 75-bp reads were generated by 
HiSeq2000 sequencers. 

Read alignment and gene quantification. The pre-processed BAM files with the 
same indexes were first merged and converted to raw FASTQ files. The FASTQ 
sequences were then realigned to a modified GRCm38 mouse genome with the 
sequences of the 92 ERCC spike-ins added. The alignment was performed using 
STAR* and sorted for expression quantification by HTSeq-count**. The counting 
was performed with a modified gene annotation file (Ensembl GRCm38.75 + 92 
ERCC spike-ins) with parameter ‘“—s no’ in default union mode. Cells with unique 
counts less than 500,000 or gene detected less than 2500 or count mapping to 
mitochondrial more than 10% were removed. The count matrix was then normal- 
ized by the size factors calculated using the external ERCC spike-ins by DESeq2 
(ref. 36). For the Lin“ PD-1"' cell single-cell mRNA-seq dataset, cells with unique 
counts less than 500,000 or count mapping to mitochondrial more than 10% were 
removed. 

Identification of highly variable genes and cell clustering. The details of the 
statistical model for testing highly variable genes have been described in ref. 11. The 
minimal biological dispersion parameter was set at 0.5 and genes with P value less 
than 0.1 were classified as highly variable. Cell clustering was performed using the 
‘Rtsne’ package. The count matrix containing both wild type and mutant ILCs were 
first normalized by the size factors calculated using their corresponding ERCC 
count matrix. The matrix was then log,-transformed and only genes that are highly 
variable were included as input for clustering. The Rtsne package is an R wrapper of 
the Barnes—Hut t-SNE C++ implementation of van der Maaten and Hinton*”. The 
seed was set at 10 for reproducibility and perplexity set at 10 and 2 for the initial 
and Lin-PD-1"' dataset, respectively. Cell clusters were then visualized in the 
biaxial scattered plot and grouped into distinct clusters on the basis of their spatial 
proximity and specific expression pattern of ILC lineage regulators. Specific genes 
of a particular cluster were defined as genes showing normalized count expression 
higher than that of the other clusters by one fold and expression greater than 100 
normalized counts. 

Gene set enrichment analysis. To identify gene signature enrichment between 
different cell clusters, the microarray gene expression data of different types of 
innate lymphoid cells were downloaded from Robinette et al.**. The data were 
normalized using the Affy package in R. The normalized probe matrix was then 
converted into gene names using Biomart. The expression of specific genes was 
calculated as the average of the normalized signal of the different probes of the 
same genes. Specific gene sets of different type of innate lymphoid cells (ILC1/NK, 
ILC2 and ILC3/LTi) were defined as genes that showed higher average expression 
(>1 normalized expression unit) than the other two types of ILCs. The enrichment 
scores were then calculated using the javaGSEA application (version 2.2.1) available 
online (http://www.broadinstitute.org/gsea/downloads.jsp) with default settings. 
Statistical analysis. All the experiments were not blinded to allocation during 
experiments. The outcome assessment of the experiments corresponding to 
Fig. 4m, n was analysed in a single-blinded manner. All other outcome assess- 
ments were not blinded. The statistical analysis was conducted with Microsoft 
Excel or Prism 6 (GraphPad). P values were calculated using a two-tailed Student's 
t-test. 
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Data availability. RNA-seq data that support the findings of this study have been 
deposited in the European Nucleotide Archive database and are accessible through 
the accession number ERP011804. All other relevant data are available from the 
corresponding author on request. 
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Extended Data Figure 1 | Quality control of sCRNA-seq data of ILC- 
progenitor-enriched bone marrow cells. a, FACS sorting strategies of 

the adult bone marrow cells from wild-type or Vav-Cre-Bel1 1b" mice. 
Flt3!° or IL-7Ra® cells were included to detect more ILC progenitors. 

Lin” Flt3!°/-IL-7Ral®/*a487+ cells were further divided into three 
populations (CD244*CD25~, CD244- CD25” and CD244~CD25*). We 
sorted two 96-well plates of CD244*CD25_, one plate of CD244- CD25 
and one plate of ILC2 progenitors CD244~- CD25” to include most ILC 
progenitors for sCRNA-seq. Two 96-well plates of Lin~ Flt3~IL-7Rat 
437+ bone marrow cells from Vav-Cre-Bcl11b!!' mice were purified to 
investigate early ILC2 development defects. Lin: CD19, CD3, CD4, CD5, 
CD8, TCR8, TCR, NK1.1, CD11b, Gr-1, CD11c and Ter119. b, Column 
charts show the fraction of cells passing specific quality control criteria in 
each plate: unique count mapped to annotated genes >500,000 (top panel); 
count mapped to mitochondrial-encoded genes <10% (middle panel); and 
number of annotated gene detected >2,500 (bottom panel). c, Percentage 
of cells passing all criteria. d, The percentages of ERCC RNA spike-ins in 


each plate. The black dots represent the mean of the dataset. e, The total 
number of unique counts mapped to annotated genes in different plates. 
The black dots represent the mean of the dataset. f, The fractions of the 92 
external ERCC RNA spike-ins in different plates. g, Kolmogorov-Smirnov 
test of individual ERCC spike-ins between the two plates did not detect 
any ERCC spike-ins showing significantly different (log, fold change > 1, 
and adjusted P value < 0.05) levels. The two vertical lines mark the log, 
fold change levels of —1 and 1, the horizontal line marks the adjusted P 
value threshold of 0.05. h, Identification of highly variable genes. Brown 
points represent annotated mouse genes. Blue points represent external 
ERCC RNA spike-ins. The magenta points represent the mouse genes that 
show significantly higher variability (false discovery rate < 0.1). The solid 
line represents the fit of the technical noise, the dashed line represents the 
50% biological CV (coefficient of variation). i, Biaxial t-SNE clustering of 
the sequenced wild-type cells. j, Column chart comparing the percentage 
of cells which show detectable mRNA expression of lineage markers in the 
wild-type bone marrow cells. 
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Gata2 


Zbtb7a 


Extended Data Figure 2 | t-SNE plots showing expression and distribution of genes described in the manuscript. The colour key shows the 
expression level. 
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Extended Data Figure 3 | Violin plots of genes highly enriched in specific clusters. The y axis indicates the log, (normalized count + 1) expression 
levels. The black point indicates the mean of expression level. The x axis indicates different clusters. 
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Extended Data Figure 4 | Heat map of TCR transcripts in different cell clusters. 
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Extended Data Figure 5 | PD-1"' expression marks ILC progenitors. 

a, Correlation of expression levels of Pdcd1 and Zbtb16 in C6 cells. 
Correlation was calculated by Pearson’s method. The fit represents the line 
of linear regression. b, Violin plot showing the selected gene expression 

in PD-1" cells of C6. The y axis indicates the log, (normalized count + 1) 
expression levels. The black point indicates the mean of expression level. 
c, Expression of ILC markers in PD- 15 cells was analysed by FACS. d, The 
in vivo developmental potential of PD-1" cells. CD45.1 Rag2~/~Il2rg’/~ 
recipients were injected with the equal numbers of CD45.1~ CD45.2* 


PD-1' cells and CD45.1+CD45.2* (F, of CD45.1 and CD45.2 parents) 
common lymphoid progenitors (200-800 cells). The progenies of these 
donor cells were analysed by FACS after 5-7 weeks (n = 3 per donor 

cell type). e, Clonal analysis of PD-1" cells in vitro. The PD-1", 
PD-1"Bcl11b* and PD-1'‘Bcl11b~ bone marrow cells were FACS-purified 
and cultured on stromal cells and analysed by FACS. ILC1 was defined 

as CD45*NK1.1*Bcll1b~, ILC2 as CD45*NK1.1~ Bell 1b* and ILC3 as 
CD45*NK1.1- Bcll1b” ROR“t*. Data are representatives of two (c) or 
three (d) independent experiments. 
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Extended Data Figure 6 | Direct comparison of PD-1"' and PLZF" ILC 
progenitors and dissection of the heterogeneity in the ILC progenitor 


compartments. a, FACS plots show PLZF" and PD-1" cells had the 
same development potential in vivo. The equal numbers of PD-1' cells 


and PLZF"' cells were adoptively transferred into the same recipient and 
analysed 3-4 weeks later (n =3 per donor cell type). b, Schematic diagram 


of the Bcl11b'*”"*" conditional knockout reporter allele, where the 
loxP-IRES-tdTomato cassette was inserted to the 3’UTR of the Bcll1b 


locus. The other JoxP site was in intron 3. Cre-loxP recombination would 


Bcl11b-tdTomato cko reporter 


Exon 3 
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delete the exon 4. The selection cassette for initial gene targeting was 
excised by Flpase-FRT recombination. c, Expression of Bcl11b in CHILPs 
were analysed in the Id2°";Bcl1 1b'4”"*" duel reporter mice (n = 6). 

d, Expression of Bcl11b in PD-1'! bone marrow cells was analysed by 
FACS (n =6). e, FACS analysis of the in vivo developmental potential 

of PD-1"Bcl11b~ cells (n=3 per donor cell type). Common lymphoid 
progenitors were used as the donor cell control. PD-1"Bcl11b~ cells 
predominantly generated ILC1, ILC2 and ILC3. Data are representatives 
of two (a, e) or three (c, d) independent experiments. 
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Extended Data Figure 7 | sCRNA-seq analysis of PD-1'! bone marrow cells. a, t-SNE clustering analysis of sequenced PD-1" cells detected two 
subpopulations. b, Heat map showing the hierarchical clustering result of PD-1" cells based on selected ILC regulators. The expression levels are log» 


transformed and ERCC-size factor normalized. 
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Extended Data Figure 8 | sCRNA-seq dissection of early ILC2 
development. a, Analysis of scRNA-seq data identified genes showing 
expression changes in C6, C7a, C8 and C9 cells. Change and distribution 
of expression of selected genes are shown. [/17rb and Bcl11b are among 
the genes showing spike expression from C6 to C9, whereas Il 1rl1 
(IL-33R) showed steadily increased expression. The bottom t-SNE plots 
showing expression of representative genes. b, The expression of IL-25R 
in PD-15' bone marrow cells in the Bcl11b'4”"* mice (n =3). ¢, Clonal 
differentiation assay of PD-1IL-25R* and PD-1IL-25R~ cells. Cells were 
cultured on OP9-DLI stromal cells in the presence of IL-7 (20.0 ng ml!) 
and SCF (50.0 ng ml’) and were analysed 10 days later. ILC1 was 
defined as CD45*NK1.1*Bcll1b~, ILC2 as CD45*NK1.1- Bell 1b* and 
ILC3 as CD45*NK1.1 Bcll1b- RORyt*. d, FACS analysis of the in vivo 
developmental potential of PD-1™IL-25R* cells. CD45.1 Rag2~/~Il2rg-/— 
recipients were injected with equal numbers of CD45.1~CD45.2+ 
PD-1"TL-25R* cells and CD45.1+CD45.2* common lymphoid 


CD45.1 


> 


progenitors (100-500 of each) and the progenies of these populations were 
analysed 5-7 weeks later (n =5 per group). e, Analysis of ILCs in Vav-Cre- 
Bcl11b™!' mice. Lin -IL-33R*IL-7Ra* ILC2s, Lin" KLRGI*IL-7Ra’® ILC2 
or Lin-"NK1.1*NKp46* ILCs from the bone marrow, lung or siLP were 
analysed (n = 4 per genotype), respectively. f, ‘Natural’ ILC2 (nILC2), 
‘inflammatory’ ILC2 (iILC2) and BALF IL-5 were analysed in Vav-Cre- 
Bcl11b"!' and the control mice after administration of IL-25 (200 ng per 
mouse per day) for 3 consecutive days (n=5 per treated group). Error 
bars denote s.e.m. g, FACS analysis of in vivo developmental potential of 
PD-1" cells from Vav-Cre-Bel1 1b" mice. CD45.1 Rag2~/~Il2rg-/~ mice 
were injected with CD45.2 PD-1" cells sorted from Bcl11b" or Vav- 
Cre-Bel1 1b" mice. The progenies of these donor cells were analysed 

4-7 weeks later by FACS (n = 3 per genotype). Data are representatives of 
three (b) or two (d-g) independent experiments. *P < 0.05, **P<0.01 
(two-tailed t-test). 
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Extended Data Figure 9 | Restoration of development of Bcl11b- 


deficient PD-1" ILC progenitors to ILC2 by overexpressing IL-25R. 


a, FACS plots showing the expression of TCF-1 and Gata3 in mutant 
PD-1" bone marrow cells. Protein expression was measured by 
intracellular antibody staining. b, Expression patterns of Tox, Id2, Tef7 
and Gata3 in the sequenced Bcl1 1b-deficient bone marrow cells. 

c, Overexpressing I]17rb in Bcl11b-deficient PD- 1‘ bone marrow cells 


The rescued cells were analysed by FACS for ILC2 surface markers. 
PD-1" cells sorted from Vav-Cre-Bcl1 1b" mice were transduced with the 
1117rb or control retrovirus. The infected cells were cultured on OP9-DL1 
stromal cells with the helper CD45.1 ILC2 progenitors in the presence 

of IL-25 (20.0ng ml~'), IL-7 (20.0ng ml~!) and SCF (50.0ng ml~!). The 
cells were collected and analysed after two weeks of culture. Data are 
representatives of two (a, c) independent experiments. 
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Extended Data Figure 10 | PD-1"i marks effector ILCs. a, FACS 
analysis of PD-1 expression on peripheral ILCs in steady-state mice 

(n= 3). b, Gating strategies of lung ILCs. Lung cNK cells were gated 

as Lin-Id2*IL-7Ra NK1.1*NKp46*; lung ILC1s as Lin“ 1d2*IL- 
7RatNK1.1*Bcll1b~; lung ILC2s as Lin“1d2*IL-7Ra*NK1.1- Bell 1b*; 
and lung ILC3s as Lin“ Id2*IL-7Ra*NK1.1-Bcll1b~. The data were 
from influenza-infected mice at 5 days after infection. cNKs count for at 
least half of the Lin~ leukocytes in these mice (n =3). ¢, FACS analysis 

of PD-1 expression on CD3* T cells, CD19* B cells and peripheral ILCs 
after J43 treatment (n= 3). The tissues were collected at day 14 after J43 
treatment. d, FACS plot shows the recognition of different epitopes of 
PD-1 by PD-1 antibody clones RMP1-30 and J43. The majority of lung 
PD-1 ILC2s were stained with both RMP1-30 and J43. e, FACS analysis of 
lung PD-1*' ¢NK, ILC1 and ILC3 at 7 days after infection (n= 3). f, BALF 
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cytokines were quantitated as shown (n =3 per group per time point). 

The four experimental groups were: Rag!~/~ mice with mock infection; 
Ragl~/~ mice infected with A/X-31 and treated with either an antibody 
isotype control or J43; Rag2~/Il2rg~/~ mice infected with A/X-31 as 

the ILC-deficient control. g, More PD-1"' cells were found after papain 
challenge (day 6) in Ragl~/~ mice (n =3). h, Rag1~/~ mice were pretreated 
with PD-1 antibody J43 or the isotype control antibody for 3 days and 
then administrated with papain (intranasally) for 5 consecutive days. The 
lung tissue was collected at day 6 for analysis (n =5 per treatment). Lung 
ILC2s were reduced in J43 treated mice. PD-1"™ and IL-5-producing ILC2 
were undetectable after J43 administration. Data are representatives of two 
(a-h) independent experiments. Error bars (f, h) denote s.e.m. 

*P< 0.05, **P<0.01 (two-tailed t-test). 
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On-target efficacy of a HIF-2ca antagonist in 
preclinical kidney cancer models 


Hyejin Cho!, Xinlin Du’, James P. Rizzi’, Ella Liberzon!, Abhishek A. Chakraborty!, Wenhua Gaol, Ingrid Carvo!?, 
Sabina Signoretti!*, Richard K. Bruick*, John A. Josey’, Eli M. Wallace? & William G. Kaelin Jr!° 


Clear cell renal cell carcinoma, the most common form of kidney 
cancer, is usually linked to inactivation of the pVHL tumour 
suppressor protein and consequent accumulation of the HIF-2a 
transcription factor (also known as EPAS1)!. Here we show that 
a small molecule (PT2399) that directly inhibits HIF-2a causes 
tumour regression in preclinical mouse models of primary and 
metastatic pVHL-defective clear cell renal cell carcinoma in an on- 
target fashion. pVHL-defective clear cell renal cell carcinoma cell 


lines display unexpectedly variable sensitivity to PT2399, however, 
suggesting the need for predictive biomarkers to be developed to use 
this approach optimally in the clinic. 

HIF-2a, as a basic helix-loop-helix (bHLH)-Per/Arnt/Sim (PAS) 
domain protein, would usually be deemed undruggable. However, 
medicinal chemistry efforts have led to the development of drug-like 
chemicals such as PT 2399 (Fig. 1a) that can bind directly to the HIF-2a 
PAS B domain (Fig. la and Extended Data Fig. 1a, b) and prevent 
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HIEF-2a from binding to ARNT (Fig. 1b and Extended Data Fig. 1c) and 
hence to DNA”. PT2399 minimally affected a panel of 68 receptors, 
ion channels, and enzymes (Supplementary Table 4). 

Treating VHL~'~ 786-O clear cell renal cell carcinoma (ccRCC) 
cells with PT2399 repressed various HIF target genes in mRNA 
microarray (Fig. 1c), real-time PCR (Fig. le, f and Extended Data 
Fig. 1d), immunoblot (Fig. 1g) and ELISA (Fig. 1h) assays. PT2399 
did not suppress HIF-1a-specific targets such as BNIP3 (Fig. le and 
Extended Data Fig. le). As well as binding directly to HIF-2a, PT2399 
destabilized HIF-2a, which might enhance the effects of PT2399 on the 
DNA-binding activity of HIF-2a (Fig. le, g and Extended Data Fig. 1f). 
PT2399 downregulated gene sets that were induced by hypoxia, HIF, 
and c-Myc, consistent with reports that HIF-20 and c-Myc cooperate to 
promote ccRCC’® (Fig. 1d, Extended Data Fig. 1g and Supplementary 
Table 5). 

Next we made HIF-2a~'~ 786-O cells using the CRISPR-Cas9 
gene editing technique (Extended Data Fig. 2a). The cells proliferated 
under standard conditions (Extended Data Fig. 2c-f), consistent with 
the effects of suppression of HIF-2« by short hairpin (sh)RNA and 
treatment with pVHL in 786-0 cells”!°. We then used a lentivirus to 
reintroduce wild-type HIF-2a or a HIF-2a missense mutant (S304M) 
with an occluded PT2399-binding pocket* into these cells (Fig. 2a). 
The effects of PT2399 on HIF-responsive mRNAs was largely 
eliminated in cells lacking HIF-2a (Fig. 1c, d) or expressing HIF-20 
S304M (Fig. 2b). 

PT2399 (up to 21M) minimally altered ccRCC cell line proliferation 
under standard cell culture conditions (Extended Data Fig. 2c, g-j). 
A larger dose of PT2399 (20|1M) caused off-target toxicity because it 
inhibited the proliferation of HIF-2a '~ 786-O cells (Extended Data 
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Fig. 2a, d-f) and other cancer cell lines with undetectable HIF-2a 
(Extended Data Fig. 2k-m). However, PT2399 inhibited 786-O cell 
growth in soft agar at doses of 0.2-2 1M (Fig. 2c, g and Extended Data 
Fig. 3a, f). This effect was specific because it was reversed by expression 
of HIF-2a $304M (Fig. 2d, g) and not seen in VHL*!* SLR21 cCRCC 
cells (Extended Data Fig. 3c, f). Similarly, HIF-2a~/~ 786-O cells did 
not form soft agar colonies unless rescued with exogenous HIF-2a 
(Fig. 2e-g and Extended Data Fig. 2b). Therefore, PT2399 decreases 
HIF-dependent transcription and soft agar growth in an on-target 
manner. 

As a step towards imaging studies we infected 786-O cells, as well as 
isogenic cells expressing exogenous pVHL, with a lentivirus encoding 
firefly luciferase (Luc) driven by a HIF-responsive promoter (3 x HRE- 
Luc). As expected, PT2399 inhibited Luc activity in the VHL~'~ cells 
but not in their pVHL-proficient counterparts (Fig. 3a, b). Conversely, 
the diooxygenase inhibitor dimethyloxaloylglycine (DMOG), 
which blocks the binding of pVHL to HIFa, induced Luc activity in 
the pVHL-proficient cells but not in the VHL~/~ cells (Fig. 3a). As 
expected, PT2399 did not affect Luc driven by constitutive promoters, 
such as the CMV promoter (Fig. 3c). 

Next, 3 x HRE-Luc 786-O cells and CMV-Luc 786-O cells were 
injected into opposing kidneys of nude mice. Once tumours were 
established, as determined by serial bioluminescence imaging (BLI), 
the mice were given PT2399 or vehicle twice daily (Extended Data 
Fig. 1h, i). Two days of treatment with PT2399 decreased the 3 x HRE- 
Luc signal by more than 60%, similar to its effects in vitro (Fig. 3d, f). 
These effects were not observed in the CMV-Luc tumours or in 
vehicle-treated mice (Fig. 3d, e). In PT2399-treated mice, the 
3 x HRE-Luc signal recovered after a drug washout period and 
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decreased again after drug rechallenge (Fig. 3g). Analysis of kidneys 
removed after 2 days of PT2399 treatment revealed decreased HIF- 
responsive mRNAs, decreased Ki-67 staining (representing decreased 
proliferation), increased caspase 3 cleavage (reflective of increased 
apoptosis) and decreased microvessel density (Extended Data 
Fig. 4a-e). 

To assess the antitumour efficacy of PT2399, CMV-Luc 786-O 
cells were grown orthotopically in nude mice; once tumours were 
established, the mice were treated with PT2399 or vehicle. As expected, 
tumours continued to grow in vehicle-treated mice, as shown by weekly 
BLI. In contrast, PT2399 caused tumour stasis or regression (Fig. 4a—c 
and Extended Data Fig. 4f), which correlated with decreased circulating 
tumour-derived VEGF (Extended Data Fig. 4g), decreased proliferation 
and decreased angiogenesis (Extended Data Fig. 4h). 

Kidney-confined ccRCC can often be treated surgically. We therefore 
obtained a metastatic variant of 786-O cells (M2A cells) expressing Luc 
under an HSV TK promoter; these cells form diffuse lung colonies 
after tail vein injection". In this model, PT2399 still caused marked 
tumour regression and prolonged survival (Fig. 4d-f). Similar results 
were obtained when a limited number of cells were injected in an effort 
to better mimic established lung metastases (Extended Data Fig. 5a, b). 
Introducing HIF-2a $304M into M2A cells (Fig. 4g) conferred partial 
resistance to the pharmacodynamic (Extended Data Fig. 5c) and 
antitumour effects of PT2399 (Fig. 4h, i). 

PT2399 also inhibited the growth of VHL~'~ A498 ccRCC cells 
in soft agar and orthotopic tumour assays (Extended Data Fig. 3b, f 
and Extended Data Fig. 6), consistent with the effects of HIF-2a 
shRNA in these cells!°, and inhibited the growth of a VHL~/— 
ccRCC patient-derived xenograft (PDX) (Extended Data Fig. 5d, e). 
By contrast, PT2399 did not suppress orthotopic tumours formed 
by VHL~'~ UMRC-2 cells or VHL~'~ 769-P ccRCC cells (Fig. 5a-d) 
despite inhibiting HIF-2a dimerization and HIF-2a-dependent 
transcription in these cells with ICs9s comparable to those seen in 
786-O and A498 cells (Fig. 1b and Extended Data Fig. 7a—d) and 
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Figure 3 | Pharmacodynamic effects of PT2399 
in vivo. a, b, Light emission, normalized to total 
cellular protein (a) and cell number (b) in 786-O 
3 x HRE-Luc reporter cells expressing VHL or 
an empty vector (EV). The cells were treated with 
PT2399 overnight (a) or at 241M for the indicated 
durations (b). DMOG (1 mM) was included in a 
as a control; n = 3. RLU, relative luciferase units. 
c, RLU values of 786-O derivatives expressing 
luciferase driven by the indicated promoters and 
then exposed to either PT2399 or 1 mM DMOG; 
n= 3 biological replicates. d, Representative 
bioluminescent images (BLI) of mice with 
orthotopic tumours formed by 786-O 3 x HRE-Luc 
reporter cells (left kidney) or 786-O CMV-Luc 
reporter cells (right kidney). Images were 
obtained before and after two days of treatment 
with 30mg kg! PT2399 given twice daily (n=6 
mice from two independent experiments) or 
vehicle (n = 6 mice from two independent 
experiments) by oral gavage. e, f, Quantification 
of BLI from mice as in d; n=4 mice from 

two independent experiments. Values were 
normalized to the pretreatment values for each 


20 DMOG 
PT2399 (uM) 


786-O HRE-Luc 
P = 0.0003 


———a1 
| P <0.0001 
— 


Vehicle PT2399 


1.2 


Left, HRE 
1.0 Right, CMV 


0.8 treatment. g, Serial BLI of a mouse as in d treated 
«108 with PT2399 (blue arrows). Shown on the y-axis 

06 are absolute photon counts for the 3 x HRE-Luc 
tumour. Data shown as mean +s.e.m. (a-c, e, f). 

0.4 Statistical significance was assessed using Mann- 
Whitney tests (f). 

0.2 

Days 


despite effective suppression of HIF target genes in vivo (Extended 
Data Fig. 8a). 

To study this differential sensitivity further, we measured HIF-2a 
abundance and the response of selected HIF target genes to PT2399 
and to pVHL across a panel of VHL~/~ ccRCC cell lines. The PT2399- 
sensitive 786-O and A498 cells had higher HIF-2a levels than the insen- 
sitive UMRC-2 and 769-P cells (Fig. 5e). The PT2399-sensitive cells also 
exhibited greater inhibition of HIF target genes, as a percentage of basal 
expression, in response to PT2399 (Fig. 5f and Extended Data Fig. 8d, e) 
and, where tested, pVHL (Extended Data Fig. 8b, c). The latter finding 
further supported the idea that this differential sensitivity to PT2399 
reflects differences in HIF-2~ dependence rather than differences in 
intracellular drug accumulation. Indeed, growth of UMRC-2 cells and 
769-P cells in soft agar was insensitive to Cas9-mediated inactivation 
of HIF-2a0 and to PT2399, unlike growth of 786-O and A498 cells 
(Fig. 5g-j and Extended Data Fig. 3c, f, g). Similarly, growth of VHL~‘~ 
SKRC-20 and UMRC-6 ccRCC cells in soft agar was unaffected by 
genetic disruption of HIF-2a or treatment with PT2399 (Extended 
Data Fig. 3c-g). Growth of RCC10 cells in soft agar was unaffected 
by PT2399, but was suppressed in an on-target manner by Cas9- 
mediated loss of HIF-2a, despite the cells showing a similar decrease 
in HIF-2 target mRNAs in response to both (Extended Data Fig. 3c-g 
and Extended Data Fig. 9a-f). The importance of this discrepancy is 
unclear because the HIF-2a~/~ RCC10 cells quickly regained the ability 
to grow in soft agar after repeated culture despite persistent HIF-20 loss 
(see also below). Finally, we confirmed that genetic disruption of 
HIF-2a, like treatment with PT2399, did not affect orthotopic tumour 
growth by UMRC-2 cells (Extended Data Fig. 10). 

Differential HIF-20 dependence amongst ccRCC lines is not linked 
to their HIF-1a status because the insensitive cell lines 769-P and 
SKRC-20, like the sensitive cell lines 786-O and A498, lack wild-type 
HIF-1«!". Moreover, Cas9-mediated ablation of HIF-1a in UMRC-2 
cells did not render them HIF-2a-dependent in soft agar assays 
(Extended Data Fig. 7e-g). 
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Figure 4 | PT2399 antitumour activity. a, Representative BLI of 
orthotopic tumours formed by CMV-Luc 786-O cells before and after 
treatment with PT2399 (30 mg kg” ') or vehicle twice daily by oral gavage 
for 30 days (vehicle, n = 15 mice from three independent experiments; 
PT2399, n= 13 mice from three independent experiments). b, Spider 
plot showing growth of tumours as in a determined by serial BLI. For 
each tumour the BLI values were normalized to the corresponding day 

0 value. V, tumours from vehicle-treated mice; PT, tumours from PT2399- 
treated mice. c, Quantification of BLI from mice as in a. The value for 
each tumour was normalized to the pretreatment value for that tumour. 
d, Representative BLI of lung colonies formed by M2A-Luc cells in mice 
treated with PT2399 (30 mg kg“) or vehicle twice daily by oral gavage 
(vehicle, n = 9; PT2399, n=9 from two independent experiments). 


By contrast, HIF-2~ dependence across the ccRCC lines we 
examined loosely correlated with their basal HIF-2a levels and the 
dependence of HIF target genes in those lines on HIF-2a itself. A 
caveat is that some of these lines might have lost their dependence on 
HIF-2a owing to prolonged passage in culture, especially as HIF-20 
is not required under standard culture conditions. On the other hand, 
freshly explanted cCRCC PDXs also show variable sensitivity to PT2399 
(ref. 13). 

To begin to understand this differential HIF-2a dependence further, 
we focused on RCC10 cells because HIF target genes and soft agar 
growth are less dependent on HIF-2« in these cells than in 786-O and 
A498 cells despite their comparably high HIF-2a levels (Fig. 5e and 
Extended Data Figs 3c—g, 9b). We discovered that RCC10 cells harbour 
the canonical p53 R248W mutant (Extended Data Fig. 9g). The p53 
R248W mutation also arose spontaneously in a 786-O subclone made 
in our laboratory (Extended Data Fig. 9h) and was associated with 
acquired resistance to PT2399 (Extended Data Fig. 9i). p53 was not 
induced by DNA damage in the HIF-2-independent lines UMRC-2 
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P <0.0001 


Relative signal 


Weeks post treatment 


e, f, Quantification of BLI values (e) and Kaplan-Meier survival curves 
(f) from mice treated as in d (vehicle, n = 9 mice from two independent 
experiments; PT2399, n =9 mice from two independent experiments). 

g, Immunoblots of M2A cells infected with a lentivirus encoding HIF-2a 
S304M or empty vector (EV). h, Representative BLI of lung colonies 
formed by M24 cells in g treated with PT2399 (30 mg kg") or vehicle 
twice daily by oral gavage (for empty vector, n =7 vehicle-treated and 
n=8 PT2399-treated; for S304M, n=9 vehicle-treated and n= 10 
PT2399-treated) from two independent experiments. i, Quantification of 
BLI values as in h. Data shown as median with range (c, e, i). Statistical 
significance was assessed using Mann-Whitney test (c and i), unpaired 
t-test (e) or log-rank test (f). 


and 769-P, suggesting that these cells also have p53 pathway mutations, 
but was induced in the HIF-2-independent lines UMRC-6 and Caki-2 
(Extended Data Fig. 9g, j). Of note, p53 was modestly induced by 
PT2399 and Cas9-mediated loss of HIF-20 in p53*/* 786-O cells 
(Extended Data Fig. 9h, k), consistent with reports that HIF-2a 
constrains p53 activity in cCRCC!*"». Therefore an intact p53 pathway 
seems necessary, but not sufficient, for HIF-2-dependence in ccRCC. 

Our findings suggest that the response of VHL~/~ ccRCC to 
HIF-2a antagonists will be variable and that predictive biomarkers, 
perhaps including measures of HIF-2a activity and p53 status, will 
be needed. The current view that p53 mutations are uncommon in 
ccRCC is based mainly on studies of primary tumours removed at 
nephrectomy’. p53 pathway mutations might be more common in 
metastatic ccCRCC, from which most ccRCC lines are derived, or might 
arise after ccRCC therapies!®. Alternatively, some p53 mutations in 
ccRCC lines might have arisen ex vivo. Another important question 
is whether adding HIF-2a antagonists will enhance the activity of 
existing ccRCC drugs. 
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Extended Data Figure 1 | Binding of PT2399 to PAS-B domain of or control shRNA. e, HIF-2a-specific gene regulation in Hep3B cells; 
human HIF-2a as determined by X-ray co-crystal structure. a, X-ray n=3 biological replicates. f, Immunoblot analysis (top) and quantification 
co-crystal of PT2399 (magenta) bound to complex formed by HIF-2a (bottom) of HIF-2a in 786-O cells treated with DMSO or PT2399 for 
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crystal of PT2399 (magenta) with HIF-2a and ARNT PAS-B domains n= 3 biological replicates. g, Enrichment plots for representative gene sets 
(zoomed in on HIF-2« PAS-B pocket). c, Inmunoblots of anti-ARNT1 previously linked to HIF, hypoxia, or c-Myc. h, i, Plasma PT2399 levels 
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Extended Data Figure 2 | Inhibition of cell proliferation by PT2399 

ex vivo. a, b, Immunoblot analysis of 786-O cells after CRISPR-based gene 
editing with control sgRNA or HIF-2a sgRNA (guides 4 and 6). In b, cells 
were also infected with an empty vector (EV) or a virus expressing an 
HIF-2a sgRNA guide 6-resistant HIF-20a cDNA. c-f, Cell proliferation of 
parental 786-O cells (c) and 786-O clones subjected to CRISPR-based gene 
editing with a control sgRNA (d) or in which endogenous HIF-2« was 


successfully inactivated using two different HIF-2~ sgRNAs (guides 

4 and 6) (e, f); n =3 biological replicates. g-j, Proliferation curves for 
786-M1A (g), UMRC-2 (h), Caki-1 (i), and Caki-2 cells (j) treated with 
the indicated concentrations of PT2399. k, Immunoblot analysis of the 
indicated cell lines. 1, m, Proliferation curves for MDA-MB-231 (1) and 
A549 cells (m) treated with the indicated concentrations of PT2399; n=3 
biological replicates. Data shown as mean + s.d. (c-j, 1, m). 
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Extended Data Figure 3 | Effects of PT2399 on soft agar growth. 
a-c, Soft agar colonies formed by 786-O cells (a), A498 cells (b), and 
the indicated cell lines (c) in the presence of PT2399 at the indicated 


sgCon sgHIF2a-4  sgHIF2a-6 
*P< 0.01 


n=3 biological replicates. The reason for the differential sensitivity 
of RCC10 cells to PT2399 and the HIF-2a sgRNAs is not yet clear. 
e, Immunoblot analysis of the cells used in d. For SLR21 cells, 1 mM 


concentrations for 21 days; n =3 biological replicates. d, Soft agar colonies DMOG was added for 16h to detect HIF-2a. f, g, Quantification of soft 
formed by the indicated polyclonal cell line populations after CRISPR- agar colonies formed in a-d and Fig. 5h, j, respectively; n =3. Data shown 
based gene editing with control sgRNA or HIF-20 sgRNA (guides 4and6); as mean +s.e.m. *P < 0.01 by two-tailed Student’s t-tests (f, g). 
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Extended Data Figure 4 | Pharmacodynamic effects of PT2399 

in vivo. a, Levels of the indicated mRNAs, normalized to ACTB, in 

786-O orthotopic tumours treated with PT2399 (30 mg kg!) (n=3 mice 
from two independent experiments) or vehicle (n =3 mice from two 
independent experiments) twice daily for two days in vivo. b, Immunoblot 
analysis of 786-O orthotopic tumours treated with PT2399 (30 mg kg’) 
or vehicle twice daily for two days in vivo; vehicle, n= 3 mice from two 
independent experiments; PT2399, n = 3 mice from two independent 
experiments. c, Quantification of Ki-67 staining (vehicle, n = 3; PT2399, 
n=3 mice from two independent experiments). d, Immunohistochemistry 
of representative 786-O orthotopic tumours treated with PT2399 
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(30 mg kg~') or vehicle twice daily for two days in vivo; for vehicle, n =3 
and PT2399, n=3. Scale bars, 501m. e, Microvessel density (vehicle, 
n=5; PT2399, n=3 mice from two independent experiments) from 
tumours as in d. f, g, Representative tumours at necropsy (f) and serum 
VEGE concentrations (vehicle, n = 10; PT2399, n= 11 mice from three 
independent experiments) (g) from mice as in Fig. 4a—c just before 
necropsy. h, Representative immunohistochemical staining of 786-O 
tumours treated as in Fig. 4a—c (vehicle, n = 4; PT2399, n =5); Scale bars, 
501m. Data shown as median with range (a, ¢, e, g). Statistical significance 
was assessed by using Mann-Whitney test (e) or unpaired f-test (g). 

NS, P> 0.05. 
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Extended Data Figure 5 | Antitumour activity of PT2399 in lung 
colonization and PDX models. a, BLI of lung colonies formed after tail 
vein injection of 9,000 786-M2A cells treated with PT2399 (30 mg kg~') 
or vehicle twice daily by oral gavage. Treatment began at week 1. b, 
Quantification of BLI values as in a. Data shown as median with range 
(vehicle, n =2 and PT2399, n =3 mice from one experiment). c, Partial 
rescue of PT2399 pharmacodynamic effect by HIF-20 $304M; n= 3 
biological replicates. Levels of the indicated mRNAs, normalized to ACTB 
mRNA and then to DMSO treatment, in cells from Fig. 4g treated with 
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PT2399 at the indicated concentrations for 48 h; n = 3. Data shown as 
mean +s.e.m. *P < 0.05 by two-tailed Student's t-tests. Note that rescue 

is only partial, perhaps because these cells still produce endogenous wild- 
type HIF-2a in addition to exogenous HIF-2« $304M. d, Subcutaneous 
PDX measurements in mice randomized to the indicated treatments, 
including the FDA-approved ccRCC drug sunitinib, when the tumours 
reached 200-300 mm>. P < 0.05 for difference between PT2399 and vehicle 
(n= 8, unpaired t-test). e, Immunohistochemistry of PDX in d before 
treatment. Scale bars, 100 1m. Data shown as mean +s.e.m. (d). 
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Extended Data Figure 6 | Antitumour activity of PT2399 using A498 
cells. a, b, Representative BLI (a) and quantification of BLI measurements 
(b) of orthotopic tumours formed by A498 cells expressing firefly 
luciferase under the control of a CMV promoter before and after (30 days) 
treatment with PT2399 (30 mg kg” ') or vehicle twice daily by oral gavage 
(vehicle, n = 10; PT2399, n= 10 mice from two independent experiments) 
c, d, Representative tumours (c) and tumour masses (d) at necropsy from 
mice treated as in a (vehicle, n = 10; PT2399, n= 10 mice). e, Serum 


VEGF concentrations from mice treated as in a at time of necropsy 
(vehicle, n = 4; PT2399, n= 4 mice from two independent experiments). 
f, Immunoblot of representative tumours from a; vehicle, n = 4; PT2399, 
n= 3. g, Levels of the indicated mRNAs, normalized to ACTB, in A498 
orthotopic tumours (vehicle, n = 4; PT2399, n =3 mice from two 
independent experiments) treated as in a. Data shown as median with 
range (b, d, e, g). Statistical significance was assessed by using two-tailed 
Student’s t-tests with Welch’s correction (b, d) or Mann-Whitney test (e). 
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Extended Data Figure 7 | Elimination of HIF-1a does not render 
UMRC-2 cells sensitive to PT2399 in soft agar assays. a-c, Firefly 
luciferase activity in the indicated cell lines after infection with a virus 
containing firefly luciferase under the control of a HIF-responsive 
(HRE-Luc) promoter (a, c) or CMV promoter (b) and treatment with the 
indicated concentrations of PT2399 for 16h relative to DMSO-treated 
controls; n = 3 biological replicates. d, Immunoblot analysis of HRE-Luc- 
expressing UMRC-2 cells after CRISPR-based gene editing with control 


sgRNA or HIF-1a sgRNA (guides 2 and 3). Note that deletion of HIF-la 
in cand d was used to eliminate the contribution of HIF-1a ina. e, f, 
Immunoblot (e) and mRNA levels (f) of UMRC-2 cells after CRISPR- 
based gene editing with control sgRNA or HIF-1a sgRNA (guides 2 and 3). 
In f mRNA levels were normalized to ACTB and then to the corresponding 
control sgRNA value; n = 3 biological replicates. g, Soft agar assays of the 
cells analysed in e and f in the presence of the indicated concentrations of 
PT2399; n=3 biological replicates. Data shown as mean + s.e.m. (a-c, f). 
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and pVHL. a, Levels of the indicated mRNAs, normalized to ACTB, in genes by PT2399 across a panel of ccRCC cell lines. Downregulation of 
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experiment. b, c, Downregulation of HIF-responsive mRNAs by PT2399 to ACTB (e) and then normalized to the untreated value for that cell line 
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to ACTB mRNA (c) and then normalized to the untreated value for included for comparison. Data shown as mean + s.d. 


that cell line (b); n =3 biological replicates. Data shown as median with 
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Extended Data Figure 9 | HIF-2 dependence of RCC10 cells. 

a, Immunoblot analysis of anti-ARNT1 immunoprecipitates (IP) and 
whole cell extracts (input) prepared from RCC10 cells treated with 
increasing amounts of PT2399 or DMSO. C, control IP without ARNT1 
antibody. b, Levels of the indicated mRNAs, normalized to ACTB, in 
786-O, A498 and RCC10 cells treated with PT2399 at the indicated 
concentrations for 24h or an effective HIF-20a sgRNA (sgHIF-2a-6), 
and then normalized to cells treated with DMSO or a control sgRNA, 
respectively. Data shown as mean + s.d.; n = 3 biological replicates. 

c, Immunoblot analysis of RCC10 cells after CRISPR-based editing with 
HIF-2a sgRNAs or control sgRNA. d, e, soft agar colonies formed by 
RCC10 cells as in ¢; n = 3 biological replicates. In e cells were engineered 
to express an exogenous sgRNA-resistant HIF-2« or empty vector (EV); 
n=3.f, Soft agar colony counts as in e using ImageJ software. Colonies 
were counted using the following criteria: circularity range from 0.5 to 
1.0 and size (pixels”) from 200 to infinity. Data shown as mean +s.e.m. 
Statistical significance was assessed by using two-tailed Student’s t-tests 


LETTER 


(f). *P < 0.05. g-k, p53 pathway status in ccRCC lines. g, j, Immunoblot 
analysis of the indicated cell lines treated for 16h with etoposide or 
vehicle. Note overproduction of p53 in RCC10 cells and off-size p53 band 
in UMRC-2 cells. SLR21 cells are VHL*'*. Red, PT2399 sensitive in soft 
agar assays. Blue, PT2399 insensitive. RCC4 cells do not form soft agar 
colonies and are therefore indeterminate. h, Immunoblot analysis of 786-O 
cells that were infected with an empty lentivirus conferring puromycin 
resistance and then later found to have spontaneously acquired a p53 
mutation (R248W) compared to cells that retained wild-type p53. Cells 
were treated with PT2399 for 48h or with nutlin-3 (301M) or etoposide 
(201M) for 16h. i, Soft agar colony formation from cells in h treated with 
PT2399; n=3 biological replicates. k, Immunoblot analysis of parental 
786-O cells that underwent CRISPR-based gene editing with a control 
sgRNA or HIF-2a sgRNA (guide 6) (as in Extended Data Fig. 2a) and were 
then treated with PT2399 for 58h or treated with nutlin-3 (301M) 

or etoposide (201M) for 10h. 
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Extended Data Figure 10 | Loss of HIF-2c does not suppress UMRC-2 
orthotopic tumour growth. a-c, Tumours (a), tumour weights (b), and 
tumour immunoblots (c) at necropsy from mice after orthotopic injection 
of UMRC-2 cells that had undergone CRISPR-based editing with control 
sgRNA or sgHIF-20-6 as in Fig. 5g; sgCon, n=5; sgHIF-2a, n=5 mice 
from two independent experiments. The reason for the variable HIF-2~ 


levels in ¢ is unknown but could reflect, at least partly, variable numbers 
of host-derived cells in the tumour samples. d, Levels of the indicated 
mRNAs, normalized to ACTB, in tumours from a—c. Data shown as 
median with range (b, d). Statistical significance was assessed by using 
two-tailed Student's t-tests (b) or Mann-Whitney test (d). Loss of HIF-2a 
did suppress subcutaneous tumour growth (data not shown). 
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Clear cell renal cell carcinoma (ccRCC) is characterized by 
inactivation of the von Hippel-Lindau tumour suppressor gene 
(VHL)'”. Because no other gene is mutated as frequently in cCRCC 
and VHL mutations are truncal’, VHL inactivation is regarded as 
the governing event*. VHL loss activates the HIF-2 transcription 
factor, and constitutive HIF-2 activity restores tumorigenesis in 
VHL-reconstituted ccRCC cells*. HIF-2 has been implicated in 
angiogenesis and multiple other processes®*°, but angiogenesis 
is the main target of drugs such as the tyrosine kinase inhibitor 
sunitinib’°. HIF-2 has been regarded as undruggable'!. Here we 
use a tumourgraft/patient-derived xenograft platform'”!? to 
evaluate PT2399, a selective HIF-2 antagonist that was identified 
using a structure-based design approach. PT2399 dissociated HIF-2 
(an obligatory heterodimer of HIF-2a—-HIF-18)'4 in human 
ccRCC cells and suppressed tumorigenesis in 56% (10 out of 18) 
of such lines. PT 2399 had greater activity than sunitinib, was 
active in sunitinib-progressing tumours, and was better tolerated. 
Unexpectedly, some VHL-mutant ccRCCs were resistant to PT2399. 
Resistance occurred despite HIF-2 dissociation in tumours 
and evidence of Hif-2 inhibition in the mouse, as determined 
by suppression of circulating erythropoietin, a HIF-2 target'® 
and possible pharmacodynamic marker. We identified a HIF-2- 
dependent gene signature in sensitive tumours. Gene expression was 
largely unaffected by PT2399 in resistant tumours, illustrating the 
specificity of the drug. Sensitive tumours exhibited a distinguishing 
gene expression signature and generally higher levels of HIF-2a. 
Prolonged PT2399 treatment led to resistance. We identified 
binding site and second site suppressor mutations in HIF-2« and 
HIF-1(, respectively. Both mutations preserved HIF-2 dimers 
despite treatment with PT2399. Finally, an extensively pretreated 
patient whose tumour had given rise to a sensitive tumourgraft 
showed disease control for more than 11 months when treated with 
a close analogue of PT2399, PT2385. We validate HIF-2 as a target 
in ccRCC, show that some ccRCCs are HIF-2 independent, and set 
the stage for biomarker-driven clinical trials. 

The discovery of a 280 A? cavity within the PAS-B domain of 
HIF-2a!*!7and subsequent identification of compounds that 
bound this cavity and dissociated HIF-2« from HIF-1f'8 led to an 
iterative structure-based program that identified selective, potent 
HIF-2a antagonists such as PT2399 (described in ref. 19) and 
PT2385 (ref. 20). 


To evaluate PT2399 in renal cancer, we tested a panel of 22 inde- 
pendently generated tumourgrafts'”’* (Extended Data Table 1). To 
assess the drug’s tolerability, we evaluated its effects on weight and 
blood counts in mice bearing these tumourgrafts. PT2399 did not 
induce weight loss, whereas sunitinib, at doses matching human 
exposures! , did (Fig. 1a). However, PT2399 caused modest anaemia 
and leukopaenia (Fig. 1b and Extended Data Fig. 1a). 

We hypothesized that the reduction in haemoglobin (2.0 g dI7!; 
P=0.0001) was due to a decrease in erythropoietin (EPO), which is 
regulated by HIF-2'°. Consistent with this hypothesis, the number of 
red blood cell precursors was decreased by 35% (P < 0.0001; Fig. 1b) 
and the level of EPO, which may serve as a pharmacodynamic marker, 
was suppressed by 75% (P < 0.0001; Fig. 1b). 

PT2399 decreased tumour growth by 60% across all tumourgrafts 
evaluated (P< 0.0001; Fig. 1c). According to their responsiveness, 
tumourgrafts were classified into sensitive (tumour growth inhibition 
at last measurement > 80%), intermediate (40-80%), and resistant 
(<40%; Extended Data Table 1). Forty-five percent of tumourgrafts 
were sensitive (10/22), 23% intermediate, and 32% resistant (Fig. 1d 
and Extended Data Fig. 1b, c). Sensitive tumours included tumours 
with aggressive sarcomatoid and rhabdoid features (Extended Data 
Table 1). Among ccRCC tumourgrafts, 56% (10/18) were sensitive. 
Unexpectedly, four ccRCCs were resistant to PT 2399, including three 
with VHL mutations (Extended Data Table 1). 

PT2399 was more active than sunitinib (P= 0.0126) (Fig. 1c and 
Extended Data Fig. 1b) and inhibited tumour growth in several 
sunitinib-resistant tumours (Fig. 1d). There was no bias in treatment 
allocation, as treatment groups were balanced (pre-trial: tumour size, 
P=0.11; tumour growth rate, P=0.22; and mouse weight, P=0.34). 
PT 2399 reduced tumour cell density and increased fibrosis (Extended 
Data Fig. lc-e). Ki67 immunohistochemistry (IHC) showed that 
PT2399 inhibited tumour cell proliferation by 3.5-fold (mean value 
change of —19.5+2.4; P< 0.0001; Extended Data Fig. le, f). Inhibition 
of cell proliferation was also observed in live mice using 3/-[18F] 
fluoro-3’-deoxythymidine positron emission tomography/comput- 
erized tomography (PET/CT) scanning (Extended Data Fig. 1g, h). 
In addition, PT2399 collapsed the tumour vasculature, decreasing 
vascular area threefold (mean value change of —29.1 + 6.1; P=0.0011) 
(Extended Data Fig. le, f). To determine whether changes in vascular 
area were due to inhibition of tumour VEGE, we exploited the spe- 
cies difference between graft (human) and host (mouse). PT2399 


Kidney Cancer Program, Simmons Comprehensive Cancer Center, University of Texas Southwestern Medical Center, Dallas, Texas 75390, USA. @Department of Internal Medicine, University of 
Texas Southwestern Medical Center, Dallas, Texas 75390, USA. Department of Pathology, the First Affiliated Hospital of Sun Yat-sen University, Guangzhou, 510080, People’s Republic of China. 
4Department of Clinical Sciences, University of Texas Southwestern Medical Center, Dallas, Texas 75390, USA. 5Parkland Health and Hospital System, Dallas, Texas 75235, USA. "Department of 
Radiology, University of Texas Southwestern Medical Center, Dallas, Texas 75390, USA. 7New York Genome Center, New York, New York 10013, USA. 8Structural Biology Initiative, CUNY Advanced 
Science Research Center, New York, New York 10031, USA. 2Department of Chemistry and Biochemistry, City College of New York, New York, New York 10031, USA. !°Biochemistry, Chemistry and 
Biology Ph.D. Programs, Graduate Center, City University of New York, New York, New York 10016, USA. !!Department of Biochemistry, University of Texas Southwestern Medical Center, Dallas, 
Texas 75390, USA. !2Peloton Therapeutics Inc., Dallas, Texas 75235, USA. !3Department of Pathology, University of Texas Southwestern Medical Center, Dallas, Texas 75390, USA. 


*These authors contributed equally to this work. 


112 | NATURE | VOL 539 | 3 NOVEMBER 2016 


© 2016 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


a b c 
Haemoglobin Reticulocytes EPO é 
> 10 = 20 21,250 12,000 ; Ee 0% 
2 2 3. > =10,0004 = 32 
B= 05 xe} 1,000} . . ie SE _10% 
ES = > @ 750] > : : £ 8,000 59 
£2 o 8 wo Sf he; cose 2 6,000 22 40% 
23 > 8 soit == = f5 
ag fo} $= xe} oc 2 oo © 4,000 ° =O 60% 
& > -0.5 € lo S 338 : pees ao a <8 ly 
2 5 5B 250} She a: an ar WH 20001 28] 2 aes 
S40 * OE 3 ol == BE -80% 
Vehicle PT2399 Sunitinib Vehicle PT2399 Sunitinib Vehicle PT2399 Sunitinib Vehicle PT2399 Sunitinib 5 © 100% 
d 
XP26 XP144 XP164 XP165 XP373 
2,000 4 3,000 5 
: ee 20004 
7 500-4 1,000 
o| o 0 0 
2 = -20 0 20 40 -10 0 10 20 30 -10 0 10 20 30 -50-25 0 25 50 -10 0 10 20 30 
a 
Cc} Oo 
3} as XP374 XP453 XP454 XP469 XP534 
3 800 =i 
5 600 2 
2 400 
200 
ie) 
-60 -40-20 0 20 -50 O 50 -60 -40-20 0 20 
€ XP426 XP427 XP466 
2) Ss 2,000 5,000 4 eo 
| 2 1/500 6,000 5 4,000 4 
8| 5 1,000 4,000 + 3,000 7 
E|S 500 2,000 5 1,000 
re - 
=) 3 0 0 
£ 
r= 


Days Days 

Figure 1 | Evaluation of PT2399 in RCC tumourgraft-bearing mice. 

a, Mean change in mouse body weights after treatment with vehicle 

(n= 89), PT2399 (100 mg kg’) by oral gavage every 12h (n= 96) or 
sunitinib (10 mg kg~') by oral gavage every 12h (n = 82). b, Haemoglobin 
levels, reticulocyte counts, and erythropoietin (EPO) levels in mice treated 
as indicated. Haemoglobin and reticulocytes: vehicle n = 52 mice, PT2399 
n=58, sunitinib n = 53; EPO: vehicle n = 63, PT2399 n = 74, sunitinib 
n=61.c¢, Mean per cent change in tumour volume in mice treated with 
vehicle (n= 89), PT2399 (n = 96), or sunitinib (n = 82). d, Growth curves 
of each tumourgraft line grouped according to PT2399 responsiveness 
into sensitive (growth inhibition (GI) at end of trial > 80%), intermediate 
(GI = 40-80%), or resistant (GI < 40%). Treatment starts on day 0 


suppressed circulating human VEGF by 93%, but mouse VEGF was 
unaffected (Extended Data Fig. 1i). Thus, tumour VEGF production, 
but not extratumoral VEGF, is HIF-2-dependent and inhibited by 
PT2399. This tumour selectivity represents a marked improvement 
over current angiogenesis inhibitors. PT2399 also inhibited VEGF pro- 
duction in tumours progressing on sunitinib (Extended Data Fig. 2). 

We evaluated the effects of PT2399 on HIF-2 in tumours. 
Immunoprecipitation of the HIF-16 subunit, which is shared by both 
HIF-2a and HIF-1a, showed that PT2399 specifically disassembled 
HIF-2 but not HIF-1 complexes (Fig. 2a). Similar results were observed 
using a proximity ligation assay (Fig. 2b). Correspondingly, PT2399 
reduced the expression of HIF-2 target genes (VEGFA, SERPINE1 
(encoding PAI-1), IGFBP3, CCND1 (encoding cyclin D1), TGFA, and 
SLC2A1 (encoding GLUT 1); all comparisons P < 0.05; Fig. 2c), but not 
HIF-1 targets (CA9, PGK1, and LDHA). 

Notably, PT2399 did not affect the majority of HIF-2 target genes 
in resistant tumours (Fig. 2c). A modest decrease in VEGFA mRNA 
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and values and error bars represent mean tumour volume + s.e.m. To 
minimize bias (despite overestimation) volumes were calculated as 

length x width x height. Each tumourgraft line had n = 3-5 tumours per 
treatment group (vehicle n = 89 mice, PT2399 n = 96, sunitinib n = 82). 
a-c, Tests completed using a mixed model with compound symmetrical 
covariance structure for mice in the same tumourgraft line using vehicle 
as the reference group. Reticulocyte values were log-transformed for 
analysis; EPO levels were Box-Cox transformed; raw values are depicted 
in all figures. All bar charts depict the mean tumour volume with the error 
bar representing s.e.m., while all boxplots have median centre values. 

**P < 0.01; ***P < 0.001; ****P < 0.0001. 


did not translate into lower circulating vascular endothelial growth 
factor (VEGF; Fig. 2d). However, as determined by a reduction in EPO 
(P=0.0002; Fig. 2d), Hif-2 was inhibited by PT2399 in mice with resist- 
ant tumours. Furthermore, immunoprecipitation experiments showed 
that HIF-2 complexes were dissociated in resistant tumours (Fig. 2a). 
Thus, somewhat unexpectedly, PT2399 disassembled HIF-2 in resistant 
tumours, but HIF-2 target genes were largely unaffected. 

To better characterize the effects of PT2399, we performed RNA 
sequencing (RNA-seq) on 46 tumours (Extended Data Tables 1, 2). 
In sensitive tumours, we identified 492 RNAs that were deregulated 
by PT2399 (FDR < 0.05; Fig. 2e and Extended Data Table 3). By con- 
trast, the same analysis in resistant tumours found no genes that were 
deregulated by PT2399 (Fig. 2e, f). Similar results were obtained by an 
independent, blinded analysis (H.G. and C.R.). The selective changes 
induced by PT2399 in sensitive tumours suggest that PT2399 sensitivity 
is linked to its ability to alter gene expression. Furthermore, the lack 
of gene expression changes in resistant tumours suggest that PT2399 
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Figure 2 | PT2399 dissociates HIF-2 complexes in sensitive and resistant 
RCCs and induces changes in gene expression in sensitive tumours. 

a, Immunoprecipitation of HIF-18 from tumour lysates of sensitive 
(XP373), intermediate (XP391), and resistant (XP506 and XP169) tumours 
from mice treated with vehicle or PT2399. Samples are labelled with V for 
vehicle-treated or P for PT2399-treated followed by the mouse identifier. 
b, Proximity ligation assay detecting HIF-20—HIF-18 or HIF-la—-HIF-18 
heterodimers from vehicle- or PT2399-treated sensitive (XP374) or 
resistant (XP296) tumours and summary of results across responsive and 
resistant tumourgrafts. Images representative of quantitative data shown 
in graph. Summary includes analyses from 11 vehicle-treated tumours and 
11 PT2399-treated tumours (3 fields were analysed for each sample) in 5 
sensitive, 3 intermediate, and 3 resistant tumourgraft trials. Scale bars, 20,um. 
c, (RT-PCR for the indicated HIF-2 target genes in PT2399-sensitive, 
-intermediate, and -resistant tumours treated with vehicle (blue), PT2399 
(red), or sunitinib (green). HIF-1 target genes CA9, PGK1, and LDHA 
included as negative controls. RT-PCR was repeated three times for each 
sample. Except for PGK1 and LDHA, samples were available for n=58 
vehicle-treated tumours (sensitive: n = 11; intermediate: n = 21; resistant: 
n= 26), n=62 PT2399-treated tumours (sensitive: n = 15; intermediate: 
n= 21; resistant: n= 26), and n= 52 sunitinib-treated tumours (sensitive: 
n= 10; intermediate: n = 23; resistant: n= 19). PGK1 and LDHA data 

were available for 24 tumours for each treatment group (sensitive: n = 6; 
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intermediate: n = 8; resistant: n = 10). d, Circulating tumour-produced 
hVEGEF and mouse EPO levels in mice with sensitive, intermediate, and 
resistant tumours treated with vehicle (blue), PT2399 (red), and sunitinib 
(green). Enzyme-linked immunosorbent assay (ELISA) data were 
generated for 63 vehicle-treated tumours (sensitive: n = 21; intermediate: 
n= 19; resistant: n = 23), 74 PT2399-treated tumours (sensitive: n = 27; 
intermediate: n = 21; resistant: n = 26), and 61 sunitinib-treated tumours 
(sensitive: n = 15; intermediate: n = 23; resistant: n = 23). e, Number of 
RNAs upregulated and downregulated by PT2399 in sensitive and resistant 
tumours. f, Heatmap representation from RNA-seq analysis showing genes 
differentially regulated by PT2399 in sensitive and resistant tumours. 
Removal of an unclassified tumour (XP169) from the resistant group did 
not affect conclusions. g, RNA-seq analyses showing increased expression 
of selected genes by PT2399 in sensitive tumours. b-d, g, Tests completed 
using a mixed model with compound symmetrical covariance structure 
for mice in the same tumourgraft line using vehicle as the reference group. 
qRT-PCR levels were log-transformed for analysis; EPO and hVEGF levels 
were Box-Cox transformed; RNA-seq levels were logo-transformed; raw 
values depicted in all graphs. All bar charts depict the mean with the error 
bar representing s.e.m., while all boxplots have median centre values. 

*P < 0,05; **P< 0,01; *** P< 0,001; ****P < 0,0001, See Supplementary 
Fig. 1 for gel source images. 
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Figure 3 | Sensitive and resistant tumours can be distinguished by 
HIF-2c levels and gene expression signature. a, HIF-2« expression by 
immunohistochemistry (IHC) in sensitive (green) and resistant (red) 
tumours. Scale bars, 501m. Images representative of quantitative data 
shown in b. b, Quantification of HIF-2a-positive cells as determined 

by IHC in sensitive, intermediate, and resistant tumours from all 22 
tumourgraft lines (sensitive: n = 10; intermediate: n = 5; resistant: n =7). 
c, Western blot analysis of sensitive (green) and resistant (red) tumourgraft 
lines. XP164 lysate loaded twice as a reference for comparison between the 
two membranes. d, qRT-PCR of EPAS1 (HIF-2q) expression in sensitive 
(n= 11) versus resistant (n = 26) vehicle-treated tumourgrafts. RT-PCR 


is highly specific. Consistent with PT2399 specificity, PT2399 had less 
effect on overall gene expression than did subtle differences among 
patients’ tumours (Extended Data Fig. 3a). 

Extensive studies have investigated HIF-2 target genes in ccRCC”*™?, 
However, by leveraging (i) PT 2399 specificity; (ii) RCC tumourgrafts, 
with minimal human stroma”; and (iii) an RNA-seq algorithm exclud- 
ing contaminating mouse (stromal) transcripts, we were able to define 
the HIF-2 program particularly accurately. Among the 492 RNAs that 
were deregulated in PT2399-sensitive tumours, 439 were protein cod- 
ing, and 271 were downregulated; these included previously identified 
canonical HIF-2 targets (IGFBP3, SERPINE1, VEGFA, CCND1) as well 
as other genes such as LOX, CXCR4, IL6 and DDIT4 (also known as 
REDD1) (Extended Data Fig. 3b). Pathway and gene set enrichment 
analyses showed downregulation of cell cycle, DNA replication, cell 
cycle checkpoint, and DNA repair processes (Extended Data Table 4). 
Regulation of DNA repair genes by HIF-2, as previously observed in 
cell lines®, may explain the resistance of cCRCC to radiotherapy. PT2399 
increased the expression of 168 protein-coding genes, including fibrosis- 
related genes (ie. PDGFD), HIF1A (previously shown to be induced 
by HIF-2a knockdown”), and FBP1, a gluconeogenic gene recently 
reported to suppress RCC progression” (Fig. 2g and Extended Data 
Table 4). 

We sought to identify a biomarker that could distinguish between 
PT2399-sensitive and resistant tumours. We found that HIF-2a protein 
was expressed in 83% of cells in sensitive tumours compared to 23% in 
resistant tumours (P < 0.0001; Fig. 3a, b and Extended Data Fig. 4a). 
Although there were differences even within tumours, higher HIF-2a 
expression in sensitive tumours was observed by western blotting (Fig. 3c 
and Extended Data Fig. 4b) and RT-PCR (Fig. 3d). Lower, and at 
times undetectable, HIF-2a levels in resistant tumours may explain 
why PT2399 does not affect gene expression in this group. 

Next, we compared RNA-seq data sets between sensitive and resistant 
vehicle-treated tumours. Using a rigorous Wilcox test, we identified 
1,327 differentially expressed RNAs (Extended Data Table 3), including 
94 (76 mRNAs) that were uniformly over- or underexpressed across 
every sensitive versus resistant tumour sample (Extended Data Fig. 4c 
and Extended Data Table 3). GLIJ, a transcription factor of the sonic 


was repeated three times for each sample. e, Candidate genes from 
RNA-seq analysis differentially expressed in sensitive and resistant 
tumours. b, ANOVA used to determine whether sensitive tumours were 
different from intermediate or resistant. Bar chart depicts the mean with 
the error bar representing s.e.m. d, e, Tests completed using a mixed 
model analysis with compound symmetrical covariance structure for mice 
in the same tumourgraft line. RNA-seq values were log)-transformed for 
analysis; raw values depicted in all graphs. Bar charts depict individual 
RNA-seq values, while all boxplots have median centre values. *P < 0.05; 
**P < 0.01; ****P < 0.0001. See Supplementary Fig. 1 for gel source 
images. 


hedgehog family, and PTHLH (parathyroid hormone-like hormone), 
a neuroendocrine peptide that has been implicated in epithelial- 
mesenchymal interactions and calcium ion transport, were uniformly 
overexpressed in sensitive tumours (Fig. 3e). Notably, HIFIA expression 
was increased in the resistant group (Fig. 3e). Increased expression 
of HIF-1«a protein was also observed by immunohistochemistry in 
some, but not all, resistant tumours (Extended Data Fig. 4a). EZH2 
and MCAM were also uniformly overexpressed in resistant tumours 
(Fig. 3e). 

Overall, our data show that ccRCC can be classified into HIF-2- 
dependent and -independent tumours, and that these tumours differ 
in HIF-2a (and possibly HIF-1a) levels and in their baseline gene 
expression. These tumour subtypes did not correlate with BRCA1 
associated protein 1 (BAP1) and polybromo 1 (PBRM1) status” in 
this small series (Extended Data Table 1). Our results point to different 
mechanisms of tumorigenesis downstream of VHL that may underlie 
differences in tumour behaviour or responsiveness to therapy. 

Given the differences in gene expression, we investigated 
whether sensitive and resistant tumours showed differing imaging 
characteristics. We obtained CT scan images from patient tumours 
giving rise to tumourgrafts before surgery. The sensitive group was 
characterized by tumours with peripheral hypervascularity and a 
central non-enhancing area (typical of high-grade cCRCC”’) and, 
if present, tumour infiltration was focal (Extended Data Fig. 5). 
The resistant group was more heterogeneous, but several tumours 
were relatively hypovascular and diffusely infiltrating (Extended 
Data Fig. 5). 

We investigated whether sensitive tumours would acquire resistance. 
We exposed mice bearing tumours formed from a sensitive tumour- 
graft (XP164) to prolonged treatment with PT2399 or sunitinib. 
Sunitinib resistance developed within 60 days (Fig. 4a; compare to 
Fig. 1d), but resistance to PT2399 took > 120 days (Fig. 4a). PT2399 
resistance was associated with increased tumour vascularity and higher 
tumour VEGF production (Fig. 4b). We sequenced the HIF-2a gene 
(EPAS1) and identified a c.968G > A heterozygous mutation resulting 
in a G323E substitution in one tumour (Fig. 4c). The mutation was 
absent in a vehicle-treated tumour and in the second resistant tumour 
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Figure 4 | Acquired resistance following prolonged PT2399 exposure. 
a, Tumour volumes from a cohort of mice of the XP164 tumourgraft 
line treated with vehicle (blue lines, n = 2; V3286 and V3299); sunitinib 
until the development of resistance (green lines, n = 2; $3295 and $3296; 
compare to Fig. 1d); or PT2399 (red lines, n = 2; P3283 and P3288). 

b, Circulating human VEGF levels in mice treated for the indicated 
number of days (d) showing increased tumour-produced VEGF with 
development of resistance (28 day bars, n = 3; all other bars, n = 2). 

The bar chart depicts the mean with the error bar representing s.e.m. 

c, Bidirectional chromatograms from tumourgrafts that developed 
resistance compared to controls: P3283 (c.968G > A in EPAS1 (HIF2A) 
leading to G323E) and P5123 (derived from P3288; c.1338C > A in ARNT 
(HIF 1B) leading to F446L). d, Crystal structure of PAS-B domains from 
HIF-2a bound to HIF-18 (PDB entry 4ZP4; ref. 28) highlighting side 


(despite originating from the same parental tumour). Structural 
analyses of HIF-2'”8 showed that G323 is at the entrance to the cavity, 
where PT2399 binds (Fig. 4d). Akin to engineered mutations!”?, a 
glutamate side chain would prevent PT2399 access. Consistent with 
this notion, PT2399 failed to dissociate HIF-2 complexes in mutant 
tumours (Fig. 4e). 

We then sequenced HIF-18 from the second resistant tumour, and 
failed to identify a mutation. Nevertheless, HIF-2 complexes had 
reformed (Fig. 4e). The tumour was passaged in mice, which were 
maintained on PT2399, and remained resistant. Immunoprecipitation 
experiments again showed dimeric HIF-2 complexes (Fig. 4f). 
Sequencing of passaged tumours revealed a heterozygous c.1338C > A 
mutation resulting in a F446L substitution in the HIF-18 PAS-B domain 
(Fig. 4c). F446 is at the interface between HIF-18 and HIF-2a (Fig. 4d). 
We postulate that F446L functions as a second-site suppressor mutation 
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[See Tubulin 


chains of G323 (lining opening of PT2399 binding pocket in HIF-2«) and 
F446 (in HIF-16 at the interface with HIF-2q). In another structure 

(PDB entry 4GHI; ref. 18) the quaternary arrangement between HIF-2a 
and HIF-18 PAS-B domains differs, but F446 remains at the interface. 

e, HIF-13 immunoprecipitation from XP164 tumourgrafts before and after 
(red) development of resistance showing reformation of HIF-20-HIF-18 
complexes following the acquisition of resistance (V, vehicle; P, PT2399). 

f, HIF-18 immunoprecipitation from tumours of mice with HIF-2a or 
HIF-18 mutations (or wild-type controls) treated with PT2399 (n =3 

mice per group). g, FLAG immunoprecipitation from HEK293T cells 
transfected with plasmids encoding FLAG-tagged HIF-13 (FLAG-HIF-18; 
FLAG-HIF-18-F446L) or HA-tagged HIF-2a (HA-HIF-20; HA-HIF-2a 
-G323E) treated with vehicle or PT2399. See Supplementary Fig. 1 for gel 
source images. 


and that a more flexible side chain at the complex interface accom- 
modates conformational changes induced by PT2399, allowing 
drug-bound HIF-2« to bind to HIF-18. Experimentally, both HIF-18 
(F446L) and HIF-2« (G323E), when expressed in cells, were sufficient 
to preserve the formation of HIF-2 dimers despite treatment with 
PT2399, and their effects appeared additive (Fig. 4g). Overall, these 
results pave the way for second generation inhibitors or complementary 
approaches to leverage other potential drug-binding pockets that have 
been recently revealed”®. 

Finally, a patient with metastatic ccCRCC, whose tumour gave rise to a 
sensitive tumourgraft (XP165), enrolled ina phase 1 trial with PT2385°° 
(NCT02293980). The patient, a 47-year-old male, had originally pre- 
sented with omental and abdominal wall metastases following a radical 
nephrectomy of a stage III, high-grade ccRCC. After a failed attempt 
at surgical removal, he had received high-dose IL2, bevacizumab, 
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sorafenib, everolimus, sunitinib, pazopanib, and axitinib. Despite exten- 
sive pretreatment, he remained free of progression on PT 2385 for more 
than 11 months (Extended Data Fig. 6). These data validate HIF-2 as a 
target for ccRCC, provide insight into HIF-2-mediated tumorigenesis, 
establish variable tumour dependency on HIF-2 identifying different 
ccRCC subtypes and associated biomarkers that may be incorporated in 
future clinical trials, showcase the specificity of PT2399, and anticipate 
mechanisms of resistance. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


Nomenclature. Throughout the manuscript and figures, XP refers to the tumour- 
graft line; V refers to vehicle; S refers to sunitinib; and P refers to PT2399. Numbers 
following V, S, or P refer to the mouse identifier (ear tag) of that sample. 

Drug trials. Drug trials in tumourgraft-bearing mice were done as previously 
described!*. Briefly, ~64-mm? fragments of tissue from stably growing ortho- 
topic tumourgrafts were implanted subcutaneously in 4—6-week-old female and 
male non-obese diabetic (NOD)/severe combined immunodeficiency (SCID) 
mice. When tumour volumes reached ~300-600 mm’, mice were segregated into 
treatment groups (3-5 mice per group) based on (i) tumour volume, (ii) growth 
rate, and (iii) mouse weight. A sample size of five mice per treatment arm gave us 
80% power to detect a significant tumour volume differential on the 28th day after 
treatment between the reference arm and a treatment arm using a two-sample 
t-test, assuming a true 600-mm? tumour volume difference with a standard devi- 
ation of 250 and attrition margin of ~20%. Since the mixed model analysis uses 
about eight repeated measures from each mouse, even with a few more covariates 
included in the model, the power will be similar or even higher. Vehicle (10% 
EtOH, 30% PEG400, 60% MCT (0.5% methyl cellulose, 0.5% Tween 80 (aq))) was 
administered by gavage every 12h. Sunitinib (LC Laboratories) was administered 
by oral gavage every 12h at 10mgkg' in 0.5% carboxymethylcellulose (CMC) 
in D5W. PT2399 (Peloton Therapeutics, Inc.) was administered at 100mgkg™! 
by oral gavage in 10% EtOH, 30% PEG400, 60% MCT. Mouse weights were taken 
weekly and treatment doses were adjusted weekly. Tumours were generally meas- 
ured twice a week using a digital caliper. While leading to an overestimation 
in tumour volumes, to minimize bias!”, tumour volume was calculated by the 
formula: tumour volume =! x w x h, where | is the largest dimension of the 
tumour, w is the largest diameter perpendicular to J, and h is maximal height 
of the tumour. Trials typically lasted 4 weeks, but this varied depending upon 
tumour growth rates. Overall, > 14,000 measurements were obtained. Assuming 
a digital caliper measurement error rate up to 10%, 99.8% of measurements were 
within protocol limits. Consideration was given to tumour growth rates, curve 
separation and the foreseeable need for additional mice for repeat experiments. 
Mice were monitored during treatment and provided appropriate veterinary 
care. In accordance with UT Southwestern’s Institutional Animal Care and Use 
Committee (LACUC) policies, animals were euthanized within timeframes spec- 
ified by the veterinary staff once tumour diameters were greater than 2cm. Mice 
were also euthanized if they exhibited signs of adverse clinical health. A total of 
n= 22 tumourgraft trials were completed with n = 89 vehicle-treated tumours 
(sensitive: n = 39; intermediate: n = 22; resistant: n = 28), n=96 PT2399-treated 
tumours (sensitive: n = 42; intermediate: n = 24; resistant: n = 30), and n= 82 
sunitinib-treated tumours (sensitive: n = 32; intermediate: n = 22; resistant: n = 28). 
Blood cell counts and haemoglobin measurements. Complete blood counts 
(CBC) (platelets, white blood cells, neutrophils, lymphocytes, and haemoglobin) 
were measured at the end of ~28-day trials and run on an IDEXX ProCyte Dx 
analyser. CBCs were available for 17 tumourgraft trials, with 52 vehicle-treated 
mice (sensitive: n = 10; intermediate: n = 21; resistant: n = 21), 58 PT2399- 
treated mice (sensitive: n = 13; intermediate: n= 19; resistant: nm = 26), and 53 
sunitinib-treated mice (sensitive: n = 8; intermediate: n = 22; resistant: n = 23). 
PET/CT. 3/-[!8F]fluoro-3’-deoxythymidine ((!8] FLT) was synthesized at 160°C 
for 10 min using a GE FXN module through the nucleophilic substitution reaction 
between 2,3/-anhydro-5/-O-benzoyl-2/-deoxythymidine and ['*F]KF (potassium 
fluoride) in DMSO, followed by deprotecting the benzoyl group in IN NaOH 
solution. The product was separated and purified by HPLC. The injection dose 
of [18] FLT was prepared in saline containing 10% ethanol. Small animal PET/CT 
imaging studies were performed on a Siemens Inveon PET/CT Multimodality 
System. PET/CT scans were conducted on mice with both orthotopic and subcu- 
taneous tumours. Orthotopic tumourgrafts were implanted using 2-3 pieces of 
2 x 2 x 2-mm tissue underneath the left renal capsule of NOD/SCID mice. Once 
tumours became palpable, a baseline PET/CT scan was performed, and within 72h, 
PT2399 treatment was started. PT2399 was continued for 8-10 days, after which 
a second PET/CT scan was performed to assess tumour response. After injection 
with 0.12 mCi of [!8]FLT via the tail vein, and a 60-min wait period to allow for the 
radiotracer’s distribution and uptake, mice were anaesthetized using 3% isoflurane, 
which was decreased to 2% during imaging. CT imaging was acquired at 80kV and 
500\1A with a focal spot of 58 jum. The PET imaging was acquired for 500s directly 
following the acquisition of CT data. CT images were reconstructed with Cobra 
Reconstruction Software, and PET images were reconstructed using the O9EM3D 
algorithm. Reconstructed CT and PET images were fused and analysed using the 
manufacturer's software. For quantification, regions of interest were drawn aided 
by CT images and then quantitatively expressed as per cent injected dose per gram 
of tissue (%ID/g). 

Immunohistochemistry. Immunohistochemistry (IHC) was performed using 
Dako Autostainer Link 48. The HIF-1a and HIF-20 immunohistochemical 


procedures and interpretations were standardized based on expression profiles 
in well-characterized cell lines (786-O, 786-O empty vector, and 786-O VHL- 
reconstituted cell lines) and human ccRCC tissue with known expression for 
these two proteins by western blot. Multiple commercially available antibodies 
were evaluated and the antibodies with most consistent results were selected for 
further studies. Briefly, for HIF-1a and HIF-2ca staining, after hydration, anti- 
gen retrieval was accomplished with EnVision FLEX Target Retrieval Solution, 
Low pH (K800521, Dako) in a Dakocytomation Pascal pressure cooker; Ki67 
and CD31 antigen retrieval was done using a Dako PT Link. Slides were incu- 
bated in 3% hydrogen peroxide for 10 min. Primary antibodies were added and 
incubated for 40 min at room temperature. Primary antibodies: HIF-1a (1:500, 
NB100-105, Novus), HIF-2« (1:200, sc-46691, Santa Cruz), Ki67 (ready-to-use, 
IR-626, Dako) and CD31 (1:200, LS-B1932, LifeSpan BioSciences). After rinsing 
with wash buffer, EnVision FLEX mouse/rabbit linker (K802121/K800921, Dako) 
was applied to the tissue and incubated for 10 min. Secondary antibody, EnVision 
FLEX/HRP (K800021, Dako), was incubated for 20 min. Sections were then 
processed using the Envision FLEX Substrate Working Solution for 10 min 
followed by dehydration in a standard ethanol-xylene series and mounting medium 
(8310-4, Thermo Scientific). IHC of HIF-1a and HIF-2a was performed on 
pre-treated tumourgraft tissue for n = 22 tumourgraft lines. Appropriate positive 
and negative controls were used with each run of immunostaining. The percentage 
of tumour cells in the entire section examined was recorded by a pathologist blinded 
to the western blot results. Only a 2 or 3+ nuclear positive reaction was consid- 
ered as positive expression (staining scale: 0 =no staining, 1 = weak, 2= moderate, 
3=strong). 

Proliferation index and microvessel quantification. To assess tumour prolif- 
eration index, we performed immunostaining for Ki67, and to assess tumour 
microvessels, we performed CD31 immunostaining on tumours following treat- 
ment with vehicle or PT2399. IHC was completed on n= 10 sensitive tumour- 
graft lines, with n = 28 vehicle-treated tumours and n = 31 PT2399-treated 
tumours. Slides were digitally scanned using an Aperio Scanscope AT Turbo and 
reviewed using the Aperio eSlide Manager (ver. 12.0.0.5040) and Imagescope (ver. 
12.1.0.5029) systems (Leica Biosystems). For Ki67, Aperio Genie (ver. 11.2) pattern 
recognition software was used to identify and select tumour areas for quantitative 
analysis with the Aperio Nuclear algorithm (version 11.2), yielding a percentage 
of tumour nuclei positive for Ki67. In a small subset of tumours where Genie 
inadequately identified tumour cells, representative tumour regions were manu- 
ally selected (tumour necrosis areas were avoided) and reanalysed. Quantitative 
measurements of microvessels, including density and average lumen area, were 
obtained using the Aperio Microvessel algorithm (version 11.2) from manually 
selected representative tumour regions. 

Real-time PCR. RT-PCR data was generated for 16 tumourgraft trials, except 
for CA9 and LDHA, which were evaluated in 12 tumourgraft trials. Three RT- 
PCR reactions were run concurrently for each tumour. Total RNA was isolated as 
described previously*!. cDNA was synthesized using iScript Reverse Transcription 
Supermix for RT-qPCR (170-8841, Bio-Rad). qRT-PCR was performed on a 
Bio-Rad CFX96 Real-Time PCR system using iTaq Universal SYBR Green SMX 
(1725124, Bio-Rad). Primers were synthesized by Invitrogen. Primers sequence 
available upon request. 

VHL methylation. HIF2-I sensitive ccCRCC tumourgrafts that had wild-type VHL 
status (XP 164, XP373, XP453, and XP454) were tested for VHL methylation using 
the Affymetrix Promoter Methylation PCR Kit (MP1100). 
Immunoprecipitation and western blot. Tumour tissue was lysed in IP buffer 
containing 25 mM Tris-HCl pH 7.4, 150mM NaCl, 1% NP-40, 1mM EDTA, 5% 
glycerol, 0.5mM DTT, with 3-4 freeze-thaw cycles. 10-20% of the lysate was 
saved for input; 40 j1g was mixed with 3 x loading buffer (10% SDS, 33.3% glycerol, 
300 mM DDT, 0.2% bromophenol blue) for input. After pre-clearing the lysate 
with 5011 of a 1:1 solution of recombinant protein G-sepharose 4B (101242, Life 
Technologies) for 1h, 1 mg protein was mixed with 20,11 ARNT/HIEF- 16 antibody 
(sc-55526, Santa Cruz) and rocked overnight at 4°C. 3011 protein G-sepharose 
4B equilibrated with IP buffer was then added, rocked for 1h at 4°C, and spun 
at 3,000 rpm for 10s. The supernatant was removed and the beads washed three 
times with IP buffer containing DTT. 201] of 1 x loading buffer was added to the 
beads and vortexed gently, then boiled for 5 min and spun at max speed for 5 min. 
The entire sample was loaded for western blot analysis. For western blot analysis, 
both HIF-1a antibody (A300-286A, Bethyl) and HIF-2a antibody (NB100-122, 
Novus) were diluted at 1:1,000 in 5% BSA in TBS and incubated overnight at 4°C. 
Tubulin antibody (T5168, Sigma) was diluted at 1:5,000. Primary antibodies were 
detected using horseradish peroxidase-conjugated secondary antibodies (31430, 
31460, Pierce) followed by exposure to enhanced chemiluminescence substrate 
(mixing 1:1 solution 1 (2.5mM luminol, 0.4mM pCoumaric acid, 0.1 M Tris-HCl) 
and solution 2 (0.015% H Oz, 0.1M Tris-HCl)). 

Sanger sequencing. Primer sequences are available upon request. 


© 2016 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


Transfections. HEK293T cells (ATCC; no perceived need for authentication; neg- 
ative for mycoplasma) were cotransfected with the indicated expression plasmids 
using Lipofectamine 2000 (Invitrogen) following the manufacturer's instructions. 
After 36h, cells were treated with PT2399 (10|1M) at 37°C for 5h, harvested for 
immunoprecipitation with anti-FLAG beads (A2220-1ML, Sigma) and then sub- 
jected to western blot analysis. Plasmid laboratory database: #930 (pcDNA3.1 
Flag-HIF1), #931 (pcDNA3.1 Flag-HIF16 [F446L]), #932 (pLVX HA-HIF-2a 
-IRES-zsGreen), and #933 (pLVX-HA-HIF-2a [G323E]-IRES-zsGreen). 

In silico structural analysis. The G323E and F446L mutations were evaluated using 
PyMOL and Protein Data Bank 4ZP4 (ref. 28). 

ELISA. Mouse VEGF (MMV00), human VEGF (DVE00), and mouse erythropoi- 
etin (MEPOOB) ELISA kits were from R&D Systems. Briefly, 50 ,1l assay diluent was 
added to each well. 50,11 of either standard, control, or sample was then added to a 
well. For mEPO and mVEGF ELISA, the serum was diluted twofold and fivefold, 
respectively, with calibrator diluent. For hVEGE, no dilution was performed. Plates 
were incubated for 2h at room temperature on a horizontal orbital microplate 
shaker, then aspirated and washed for a total of five washes. After the last wash, 
100 il of mEPO/mVEGF/hVEGEF conjugate was added to each well and incubated 
for 2h at room temperature on a shaker. Plates were washed five times with wash 
buffer and 100 il of substrate solution was added to each well and incubated for 
30 min at room temperature, during which time the plates were covered to protect 
from the light. Stop solution was then added to each well, with gentle tapping to 
ensure thorough mixing. The optical density of each well was determined using 
a microplate reader set to 450nm. Wavelength correction was set to 540 nm. The 
final optical density (OD) value was obtained by subtracting readings at 540 nm 
from the readings at 450 nm. ELISA data were generated for a total of 20 tumour- 
graft trials. 

Proximity ligation assay. Mouse anti- HIF-1a (NB100-105, Novus), mouse 
anti-HIF-2« (sc-46691X, Santa Cruz) and rabbit anti-ARNT/HIF-18 (A302- 
765A, Bethyl) were used. Primary antibodies were concentrated and buffers 
were exchanged using a Vivaspin 500 Centrifugal Concentrator (VS0131, Fisher 
Scientific). Antibodies were diluted to 1 mg ml! in PBS. Primary antibody conju- 
gation was done with a Duolink In situ Probemaker MINUS/PLUS kit (DUO92010 
& DUO92009, Sigma-Aldrich). Briefly, 2,11 of conjugation buffer was added to 2011 
of the antibody (1 mgml~!), mixed gently, transferred to one vial of lyophilized 
oligonucleotide (PLUS or MINUS), and incubated at room temperature overnight. 
2 ul of stop reagent was then added to the reaction and incubated at room temper- 
ature for 30 min. 24,11 of storage solution was added and the conjugation stored at 
4°C. Tumour tissue was blocked with PBS-T (0.1% Triton X-100) + 1% BSA for 
30min after antigen retrieval. Conjugated HIF1-a-MINUS, HIF2-a-MINUS and 
ARNT-PLUS were diluted in blocking buffer containing 1 x assay reagent (20x) 
at a dilution of 1:50, 1:50, and 1:200, respectively. The mixture was allowed to 
sit for 20 min at room temperature before diluted primary antibody was added 
to each sample. Slides were incubated in a humidity chamber overnight at 4°C. 
Duolink In situ Detection Reagents Red (DUO92008, Sigma-Aldrich) were used for 
signal detection. Briefly, slides were washed with wash buffer A, ligation solution 
containing ligase at a 1:40 dilution was added, and slides were incubated in a pre- 
heated humidity chamber for 30 min at 37°C. After washing in 1x wash buffer A 
with gentle agitation, amplification solution containing polymerase was added at a 
1:80 dilution, and slides were then incubated in a pre-heated humidity chamber for 
100 min at 37°C. After washing in 1 x wash buffer B and then 0.01 x wash buffer 
B, slides were dried at room temperature in the dark and mounted with a cover 
slip using a minimal volume of Duolink In situ Mounting Medium with DAPI 
(DUO82040, Sigma-Aldrich). After approximately 15 min, slides were analysed 
by fluorescence microscopy (Olympus) using a 40x objective. Image analysis was 
done with the ImageJ 1.48V program. Pictures of three fields for each sample were 
used. At least 100 cells of each sample were counted. 

RNA sequencing. 23 vehicle- and 23 PT2399-treated tumour RNA samples, including 
5 sensitive XPs (XP144, XP164, XP373, XP374, and XP453) and 4 resistant 
XPs (XP169, XP296, XP490, and XP506), underwent RNA sequencing at the 
New York Genome Center. RNA sequencing libraries were prepared using the 
Illumina TruSeq Stranded mRNA Sample Preparation Kit. Briefly, 500 ng total 
RNA was purified by oligo-dT beads selecting for polyadenylated RNA species 
and fragmented by divalent cations under elevated temperature. The fragmented 
RNA underwent first strand synthesis using reverse transcriptase and random 
primers. Second strand synthesis created the cDNA fragments using DNA poly- 
merase I. Following RNaseH treatment, the cDNA fragments went through end 
repair, adenylation of the 3’ ends, and ligation of adapters. The cDNA library was 
enriched using eight cycles of PCR and purified. Quality control consisted of 
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assaying the final library size using the Agilent Bioanalyzer and quantifying the 
final library by RT-PCR and PicoGreen (fluorescence) methods. A single peak 
between 250 and 350 bp indicated a properly constructed and amplified library 
ready for sequencing. Sequencing was performed on a HiSeq 2500 using v4 SBS 
chemistry according to the Illumina protocol, as described*’. Sequencing libraries 
were loaded onto the HiSeq 2500 flowcell for clustering on the cBot using the 
instrument-specific clustering protocol. Given HiSeq 2500 capabilities (200-250M 
passed filter 2 x 50-bp sequencing reads per flow cell lane), we sequenced 5 samples 
per lane in order to obtain a minimum of 50M PF reads per sample. With one 
exception, > 100 million reads were obtained per sample (median 146,644,355; 
95% distribution-free CI: 142,380,928-151,324,826; Extended Data Table 1). Any 
gene with more than 50 reads in any sample was kept; only genes that had low reads 
in all of the samples were removed. This left 20,667 genes after removal of pseu- 
dogenes. cDNA sequences were aligned to a combined index of mouse and human 
reference sequences with STAR v 2.4.0c. Mouse reads were filtered out and the 
remaining reads were re-mapped to the NCBI hg37 using STAR aligner (v2.3.1z)*. 
Quantification of genes annotated in Gencode v19 was performed using HTSeq™*. 
Picard and RSeQC* were used to collect QC metrics (http://broadinstitute.github. 
io/picard/). Differential gene expression analysis was measured using edgeR*. 
A false discovery rate (FDR) cutoff of 0.05 was applied to identify the statisti- 
cally significant genes between comparison groups. FDR was calculated using the 
Benjamini and Hochberg method” for adjusting for multiple hypothesis testing. 

RNA-seq data were deposited into the Sequence Read Archive (SRP073253). 
For RNA-seq data, the tumourgraft number is preceded by an S for sensitive or 
R for resistant followed by treatment and ear tag. SRS1395449 (S144-P4340), 
SRS1395526 (S144-P4342), SRS1397028 (S164-P3281), SRS1397038 (S164-P3287), 
SRS1397048 (S164-P3297), SRS1397056 (S373-P4241), SRS1397057 (S373-P4244), 
SRS1397060 (S373-P4250), SRS1397059 (S374-P5172), SRS1397058 (S453-P5103), 
SRS1396986 (S453-P5104), SRS1396988 (S453-P5109), SRS1396987 (S144-V 4352), 
SRS1396989 (S144-V 4377), SRS1396991 (S164-V3290), SRS1396993 (S164-V3294), 
SRS1397021 (S164-V3298), SRS1397024 (S373-V4232), SRS1397025 (S373-V4236), 
SRS1397026 (S373-V4237), SRS1397027 (S374-V5170), SRS1397029 (S453-V5105), 
SRS1397031 (S453-V5107), SRS1397030 (S453-V5108), SRS1397032 (R169-P5231), 
SRS1397033 (R169-P5240), SRS1397034 (R169-P5241), SRS1397035 (R296-P4512), 
SRS1397036 (R296-P4531), SRS1397037 (R490-P3207), SRS1397039 (R490-P3210), 
SRS1397040 (R490-P3214), SRS1397042 (R506-P4734), SRS1397041 (R506-P4735), 
SRS1397043 (R506-P4736), SRS1397044 (R169-V5230), SRS1397045 (R169-V5235), 
SRS1397046 (R169-V5239), SRS1397047 (R296-V4519), SRS1397049 (R296-V4524), 
SRS1397050 (R490-V3211), SRS1397052 (R490-V3218), SRS1397051 (R490-V3224), 
SRS1397053 (R506-V4743), SRS1397054 (R506-V4745), SRS1397055 (R506-V4777). 
Statistical analyses. Apart from the RNA-seq analysis, all reported P values were 
obtained from two-tailed tests at the 0.05 significance level. All bar charts depict 
the mean with the error bar representing s.e.m., while all boxplots have median 
centre values with fences extending to the greatest value inside the upper and 
lower fences (1.5{IQR] away from the 75th and 25th percentiles, respectively). 
Transformations were used where indicated to meet normality assumptions for 
analysis. These tests were completed using SAS 9.4 (SAS Institute Inc.). Except 
where indicated, the experiments were not randomized and the investigators were 
not blinded to allocation during experiments and outcome assessment. 
Regulatory. Written informed consent was obtained from the patient participating 
in the phase 1 clinical trial ‘A Phase 1, Dose-Escalation Trial of PT2385 Tablets 
In Patients With Advanced Clear Cell Renal Cell Carcinoma’ (NCT02293980). 
UT Southwestern IACUC-approved animal protocol, APN 2015- 100932, includes 
all live vertebrate experimental procedures. 
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Extended Data Figure 1 | See next page for caption. 
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Extended Data Figure 1 | Effects of PT2399 on human RCC-bearing 
mice. a, Platelet, white blood cell, neutrophil, and lymphocyte counts 
from tumourgraft-bearing mice treated with vehicle (n = 52), PT2399 
(n= 58), or sunitinib (n = 53) at the end of the drug trial period 

(~28 days). Low lymphocyte levels throughout are consistent with 
expected levels in age- and sex-matched NOD/SCID mice. b, Tumour 
growth trend lines for sensitive, intermediate, and resistant groups after 
controlling for baseline tumour volume (refer to Fig. 1d for individual 
curves). c, Representative gross images of tumours from sensitive (XP164 
and XP373; green) and resistant (XP169 and XP490; red) lines at the end 
of the drug trial. d, Representative haematoxylin and eosin-stained images 
illustrating different effects of PT2399 on sensitive tumours including 
patchy intercellular fibrosis and hyalinization (open arrow heads), reduced 
tumour necrosis (red arrows), decreased tumour cell density (XP164 and 
XP469), reduced nuclear-to-cytoplasmic ratio (XP469), cell ballooning 
(filled arrow), and dystrophic calcification (blue stars). Scale bars, 50m. 
e, Summary of histopathological changes induced by PT2399 in 10 
sensitive tumourgraft lines represented as number of tumours (n) 
compared to the total or as mean +s.e. in 28 vehicle-treated tumours 
compared to 31 PT2399-treated tumours. MVD, microvessel density 

per mm’; MLA, mean lumen area (j1m*). PT2399 collapsed tumour 
vasculature without decreasing the number of CD31-expressing 
endothelial cells. f, Top, IHC for Ki67 in tumours harvested from sensitive 
(XP 144 and XP373) or resistant (XP530 and XP506) tumours following 
treatment with vehicle or PT2399. Bottom, haematoxylin and eosin 
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staining and IHC for CD31 in sensitive tumours (XP373 and XP469) 
treated with vehicle or PT2399. Scale bars, 100 1m. g, Representative 
['8F]FLT-PET/CT images of mice with subcutaneous tumourgrafts 
treated with either vehicle or PT2399. Yellow arrows point to tumours. 

h, Representative [!8F]FLT-PET/CT images of XP144 mice with orthotopic 
tumours before and after treatment with PT2399 for 10 days. Yellow 
arrowheads, kidney tumours. White asterisks, intestine. FLT uptake 

in tumour compared to normal kidney reduced by 19% after 10-day 
treatment (n= 3; paired t-test, P=0.001). i, Human and mouse VEGF 
levels in plasma as determined by ELISA in different treatment groups 
(vehicle: n = 63; PT2399: n= 74; sunitinib: n = 61). a, i, Tests completed 
using a mixed model analysis with compound symmetrical covariance 
structure for mice in the same tumourgraft line using vehicle as the 
reference group. b, Trend lines were obtained from a mixed model 
analysis for each response group using an autoregressive (1) covariance 
structure for the longitudinal measurements on each mouse, compound 
symmetry for mice within the same tumourgraft line, and controlled for 
baseline volume. e, Continuous measures were analysed using a mixed 
model with compound symmetrical covariance structure for mice in 

the same tumourgraft line and using vehicle treatment as the reference 
group. Specifically for categorical variables, a binomial test was used to 
test whether the proportion of tumours affected by PT2399 compared to 
vehicle was different from 10%. hVEGF and mVEGEF levels were Box-Cox 
transformed; raw values depicted in all graphs. All boxplots have median 
centre values. *P < 0.05; ***P < 0.001; ****P < 0.0001. 
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tumours progressing on sunitinib. a, Tumour volumes in mice from 
sensitive lines (XP374 or XP144) switched from vehicle or sunitinib 

to PT2399 as indicated (bottom black arrows). b, Circulating tumour- 
produced hVEGF levels in mice treated with vehicle, sunitinib, or 
sunitinib followed by PT2399. The Wilcoxon rank-sum test was used to 


determine whether sunitinib (m= 4) or sunitinib followed by PT2399 
(n=6) were different from vehicle (n= 4). *P < 0.05. Boxplots have 
median centre values. c, Representative images of haematoxylin and eosin 
and Ki67 staining of tumours from mice (XP 144) treated with vehicle or 
sunitinib (left) and from tumours following a switch to PT2399 (right). 
Scale bars, 100 um. 
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Extended Data Figure 3 | RNA-seq analyses of vehicle and PT2399- on selected HIF-2 target genes. All tests completed using mixed model 
treated tumourgrafts. a, Unsupervised clustering analyses of all analysis with compound symmetrical covariance structure for mice in the 
tumourgraft samples (sensitive (S) and resistant (R), both vehicle (V)- same tumourgraft line. Values were log,-transformed for analysis; raw 


and PT2399 (P)-treated) showing clustering by tumourgraft line. b, RNA values depicted in all graphs as individual bars. 
sequencing in sensitive tumourgrafts evaluating the effects of PT2399 
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Extended Data Figure 4 | HIF-2a and HIF-1a levels in sensitive and Asterisks, underloaded samples. c, Heatmap from RNA-seq analysis 
resistant tumourgrafts. a, HIF-20 and HIF-1a IHC. 786-0 cells, which showing differentially expressed genes in sensitive (S) versus resistant (R) 
express high levels of HIF-2a, shown as controls. Scale bars, 100 1m. tumourgrafts based on uniform cutoff (see Extended Data Table 3). 

b, Western blot analyses showing heterogeneity within tumours but with See Supplementary Fig. 1 for gel source images. 


overall similar results (compare to Fig. 3c). Green, sensitive; red, resistant. 
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SENSITIVE INTERMEDIATE RESISTANT 


‘Typical’ high-grade ccRCC Focal infiltrating Diffuse infiltrating 
——— 


Mass-like Non-mass-like 
Extended Data Figure 5 | Evaluation of imaging characteristics of a central non-enhancing area (blue outline), focally infiltrating (brown 
tumours in patients corresponding to sensitive, intermediate, and outline) and diffuse infiltrating (yellow outline). Three of the seven 
resistant tumourgrafts. CT scan images from patient tumours that gave resistant tumours presented as non-mass-like, infiltrative neoplasms (red 
rise to tumourgrafts according to tumourgraft sensitivity to PT2399. arrows) whereas another tumour presented with both a largely necrotic 


Tumours were classified into masses with peripheral hypervascularity and _ renal mass and retroperitoneal lymph nodes (black outline; white arrows). 
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PT2385 


Extended Data Figure 6 | Prolonged disease control in heavily pretreated patient with metastatic cCRCC with sensitive (XP165) tumourgraft. CT 
images of selected lesions in patient treated with highly related HIF-2 inhibitor (PT2385) in phase 1 clinical trial showing overall stability in the size of 
lesions over time. Start of treatment, day 0. 
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Extended Data Table 1 | Tumourgraft features 


LETTER 


Intermediate 


n/a 


pT3aN1M1 


n/a 


43 (0.0144) 


Response XP NO. Histology Fuhrman Tissue Stage at VHL BAP1 PBRM1 Relative GI% RNA 
Grade presentation status (IHC) (IHC) (p value) seq 
Sensitive XP26 ccRCC 2 Adrenal pT1aNxMx mut mut wt 87 (0.0003) 
ccRCC 4 pT2aN1Mx mut wt wt 98 (<0.0001) Y 
me [ee [| ome | rime [we [me | | onem |  | 
ccRCC 3 Abd wall pT3bNxMx 112 (<0.0001) 
ccRCC Tu Thr pT3aN1iM1 103 (<0.0001) 
ccRCC*# Kidney pT4NxMx 109 (<0.0001) 
ecRCC 3 Tu Thr pT3bNOMx 110 (<0.0001) Y 
eet [eee [2 | oe | wm [we [nw | we | ose || 
ee [fe [ome Pome [oe [me [ese 
mae [oe fe Pome me Po Po [one 
LN wt wt 


pT3bNOMx wt mut 45 (0.0018) 
4 pT4N1Mx wt wt 44 (0.0273) 
4 pT4NiMx wt wt 54 (0.0206) 


n, 


Resistant XP169 Unclassified 


3 
/a 
4 
/a 


67 (0.0030) 


0 (0.01195) Y 
39 (0.11) 
29 (0.30) Y 


XP462 Unclassified n Kidney pT3aNOM1 wt wt mut 29 (0.11) 
XP490 ccRCC*# 4 Kidney pT3aN1M1 mut wt wt 39 (0.89) Y 
XP506 ecRCC 3 Ascites pT3aN1Mx wt? wt wt 20 (0.76) Y 
XP530 Unclassified n/a Kidney pT3bNOMx wt n/a n/a 2 (0.68) 


Fuhrman grade of primary tumour and stage at presentation (metachronous metastasis may have developed). Tissue, engrafted tissue; IHC, immunohistochemistry from tumourgrafts; Gl, growth 


inhibition; ccRCC, clear-cell renal cell carcinoma; tRCC, translocation renal cell carcinoma; Abd, abdominal; Tu Thr, tumour thrombus; LN, lymph node; mut, mutant; wt, wild-type. 


@independent tumours from sa 
*Sarcomatoid differentiation. 
*Rhabdoid features. 


me patient. 


$PT2399-treated mice had greater relative growth than vehicle-treated mice. 


*Promoter methylation. 
#Promoter not methylated. 
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Extended Data Table 2 | RNA sequencing read data 


Samples Read Count Samples Read Count 
$144-P4340 131,078,351 R169-P5231 150,980,881 
$144-P4342 127,945,953 R169-P5240 146,751,739 
$164-P3281 121,045,606 R169-P5241 144,959,159 
S164-P3287 128,070,443 R296-P4512 151,324,826 
$164-P3297 138,586,535 R296-P4531 144,982,512 
$373-P4241 162,092,320 R490-P3207 142,380,928 
$373-P4244 146,116,441 R490-P3210 164,412,241 
$373-P4250 140,629,410 R490-P3214 169,970,555 
$374-P5172 88,374,928 R506-P4734 165,472,466 
$453-P5103 120,921,569 R506-P4735 154,474,148 
$453-P5104 108,148,316 R506-P4736 173,988,590 
S453-P5109 117,009,388 R169-V5230 159,863,685 
$144-V4352 128,119,810 R169-V5235 146,783,488 
$144-V4377 148,456,002 R169-V5239 144,377,378 
$164-V3290 144,464,174 R296-V4519 146,536,970 
$164-V3294 161,750,684 R296-V4524 148,798,769 
$164-V3298 152,823,172 R490-V3211 162,273,604 
$373-V4232 156,310,574 R490-V3218 123,559,977 
$373-V4236 150,155,973 R490-V3224 151,672,989 
$373-V4237 148,496,505 R506-V4743 181,173,536 
$374-V5170 130,903,402 R506-V4745 156,598,756 
$453-V5105 123,966,544 R506-V4777 164,427,358 


$453-V5107 123,347,998 


$453-V5108 112,341,672 


Samples are labelled as S (sensitive) or R (resistant) followed by the tumourgraft (XP) line, 
treatment type (P, PT2399 or V, vehicle) and mouse identifier (ear tag). 


© 2016 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


LETTER 


Extended Data Table 3 | Number of differentially regulated RNAs across tumourgraft groups by RNA-seq analysis 


SP vs. SV RV vs. SV RP vs. SP 

(Cutoff) Up Down Up Down Up Down 

onolnas 195 297 «1776~=Ss«1766~=S—s«‘1640~—Ss«C18158 

(FDR < 0.05) 

T-Test 

(P-value < 99 213 852 584 798 695 

0.01) 

Wilcox 

(P-value < 90 207 829 498 760 621 
0.01) 

Baiterm 2 5 78 16 61 15 


(all Ax0 or AcO ) 


S, sensitive; R, resistant; V, vehicle; P, PT2399; FDR, false discovery rate. 
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Extended Data Table 4 | Top 15 downregulated and top 15 upregulated pathways in sensitive tumours treated with PT2399 


Ingenuity Canonical Pathways 


Top 15 Downregulated pathways 


Cell Cycle Control of Chromosomal 
Replication 


Mitotic Roles of Polo-Like Kinase 


Cell Cycle: G2/M DNA Damage 
Checkpoint Regulation 


Cyclins and Cell Cycle Regulation 

ATM Signaling 

GADD45 Signaling 

DNA damage-induced 14-3-30 Signaling 


Estrogen-mediated S-phase Entry 


Role of BRCA1 in DNA Damage 
Response 


Hereditary Breast Cancer Signaling 


Role of CHK Proteins in Cell Cycle 
Checkpoint Control 


Ovarian Cancer Signaling 

Pancreatic Adenocarcinoma Signaling 
Aryl Hydrocarbon Receptor Signaling 
Small Cell Lung Cancer Signaling 


Top 15 Upregulated pathways 


Hepatic Fibrosis / Hepatic Stellate Cell 
Activation 


Axonal Guidance Signaling 


Human Embryonic Stem Cell 
Pluripotency 


PAK Signaling 

Basal Cell Carcinoma Signaling 

RhoA Signaling 

Agranulocyte Adhesion and Diapedesis 
Cellular Effects of Sildenafil (Viagra) 
TR/RXR Activation 


Regulation of Actin-based Motility by Rho 


Factors Promoting Cardiogenesis in 
Vertebrates 


CXCR4 Signaling 


Cardiomyocyte Differentiation via BMP 
Receptors 


Cardiac Hypertrophy Signaling 


Molecular Mechanisms of Cancer 


Molecules 


CDC7,ORC1,MCM7,CDC45,MCM2,CDK2,ORC6,MCM3,MCM6,MCM4,MCM5,CDC6 


KIF23,ESPL1,PRC1,CDC20,CCNB2,PPP2R2A,CDC7,PLK4,PTTG1,PLK1,CDK1,CCNB1,FBXO5 


TOP2A,BRCA1,AURKA,CKS1B,CKS2,CCNB2,PLK1,CDK1,CCNB1 
CCNE2,SUV39H1,CCND1,CDK2,E2F1,CCNA2,CCNB2,PPP2R2A,CDK1,CCNB1 
JUN,BLM,BRCA1,FANCD2,CDK2,RAD51,CCNB2,CDK1,CCNB1 
CCNE2,BRCA1,CCND1,CDK2,CDK1,CCNB1 
CCNE2,BRCA1,CDK2,CCNB2,CDK1,CCNB1 
CCNE2,CCND1,CDK2,E2F1,CCNA2,CDK1 
BLM,BRCA1,FANCB,FANCD2,E2F1,FANCA,BRCA2,RAD51,PLK1 
BLM,BRCA1,FANCB,FANCD2,CCND1,E2F1,FANCA,BRCA2,RAD51,CDK1,CCNB1 
BRCA1,CDK2,E2F1,CLSPN,PPP2R2A,PLK1,CDK1 
BRCA1,VEGFA,SUV39H1,LEF 1,PTGS2,CCND1,E2F1,PRKAG2,BRCA2,RAD51 
VEGFA,SUV39H1,PTGS2,CCND1,BIRC5,CDK2,E2F1,BRCA2,RAD51 
JUN,CCNE2,MCM7,CCND1,IL1A,CDK2,E2F 1,CCNA2,IL6,ALDH1A3 


CCNE2,SUV39H1,PTGS2,CCND1,CDK2,E2F1,CKS1B 


COL19A1,COL4A6,FGFR2,IGFBP4,TGFB3,MYL4,PDGFD,COL21A1,MYL3,IGF1 


GLIS2,CXCL12,BMP5,MYL4,PDGFD,EPHB3,SRGAP3,SMO,SEMA3F,IGF1,ADAMTS1,SEMA3G,BMP4, 


MYL3 
FGFR2,TGFB3,BMP5,PDGFD,BMP4,LEFTY2,SMO 


MYL4,PDGFD,EPHB3,MYLK,MYL3 
GLIS2,BMP5,BMP4,SMO 
MYL4,MYLK,SEMA3F ,MYL3,IGF1 
CXCL12,MYL4,MYL3,CCL11,CXCL14,MMP24 
MYL4,ADCY5,MYLK,MYL3,PDE5A 
THRSP,HIF1A,G6PC,AKR1C1/AKR1C2 
DIRAS3,MYL4,MYLK,MYL3 
TGFB3,BMP5,BMP4,SMO 
DIRAS3,CXCL12,MYL4,ADCY5,MYL3 
BMP5,BMP4 


DIRAS3, TGFB3,MYL4,ADCY5,MYL3,IGF 1 


DIRAS3,TGFB3,BMP5,BCL2L11,BMP4,ADCY5,HIF1A,SMO 


P-Value 


7.94328E-16 


3.98107E-12 


2.51189E-08 


1.31826E-07 


1.38038E-07 


1.77828E-07 


1.77828E-07 


8.31764E-07 


1.58489E-06 


1.8197E-06 


1.20226E-05 


1.65959E-05 


1.99526E-05 


2.29087E-05 


6.45654E-05 


8.13E-06 


2.09E-05 


0.000145 


0.001 


0.00302 


0.00389 


0.004266 


0.004467 


0.00631 


0.006918 


0.007413 


0.010233 


0.011482 


0.012303 


0.013183 
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Atomic model for the membrane-embedded Vo 
motor of a eukaryotic V- ATPase 


Mohammad T. Mazhab-Jafari!, Alexis Rohou?, Carla Schmidt*+, Stephanie A. Bueler!, Samir Benlekbir!, Carol V. Robinson? & 


John L. Rubinstein!* 


Vacuolar-type ATPases (V-ATPases) are ATP-powered proton 
pumps involved in processes such as endocytosis, lysosomal 
degradation, secondary transport, TOR signalling, and osteoclast 
and kidney function. ATP hydrolysis in the soluble catalytic 
V, region drives proton translocation through the membrane- 
embedded Vo region via rotation of a rotor subcomplex. Variability 
in the structure of the intact enzyme has prevented construction of 
an atomic model for the membrane-embedded motor of any rotary 
ATPase’. We induced dissociation and auto-inhibition of the V, 
and Vo regions of the V-ATPase by starving the yeast Saccharomyces 
cerevisiae®’, allowing us to obtain a ~3.9-A resolution electron 
cryomicroscopy map of the Vo complex and build atomic models 
for the majority of its subunits. The analysis reveals the structures of 
subunits acgc’c’'de and a protein that we identify and propose to be 
a new subunit (subunit f). A large cavity between subunit a and the 
c-ring creates a cytoplasmic half-channel for protons. The c-ring has 
an asymmetric distribution of proton-carrying Glu residues, with 
the Glu residue of subunit c” interacting with Arg735 of subunit a. 
The structure suggests sequential protonation and deprotonation of 
the c-ring, with ATP-hydrolysis-driven rotation causing protonation 
of a Glu residue at the cytoplasmic half-channel and subsequent 
deprotonation of a Glu residue at a luminal half-channel. 

Following starvation-induced dissociation of the V, and Vo regions 
of V-ATPase, the auto-inhibited Vo complex is markedly more struc- 
turally homogeneous than in the intact V-ATPase'. Our 3D electron 
cryomicroscopy (cryo-EM) map reveals the structure of the intact Vo 
complex (Fig. la—c, Extended Data Fig. 1 and Supplementary Table 1) 
with an overall resolution of 3.9A (Extended Data Fig. 2). Although 
the resolution is variable throughout the map (Extended Data Fig. 2c), 
side-chain densities are apparent for most of the a-helices in the com- 
plex, allowing for the construction of an atomic model (Fig. 1d and 
Extended Data Fig. 3). A few regions (for example, subunits cs) and c(6) 
in Extended Data Fig. 3g) only have sufficient resolution to model large 
side chains; however this still allows the protein sequence to be placed 
in register in the map. Resolution was poorest for the soluble domain 
of subunit a where it contacts subunit d, which we modelled entirely as 
poly-alanine. The map reveals the structures of the cgc’c”-ring (Fig. la, 
pink, magenta and purple), subunit a (Fig. 1a, green), subunit d (Fig. 1a, 
cyan), and two additional proteins. These additional components 
were detected in a lower-resolution analysis of the intact V-ATPase 
complex!, but can be seen here to consist of two transmembrane 
a-helices each (Fig. 1b, c, blue and red-brown). Both of these proteins 
appear to be stoichiometric components of the complex, as suggested 
by their relative densities in the map. One of these subunits (Fig. 1b, c, 
red-brown) abuts «helix 2 and the loop connecting a-helices 7 and 8 
of subunit a. The other protein is an a-helical hairpin that contacts 
a-helix 1 of subunit a (Fig. 1b, c, blue). These two proteins are discussed 
in more detail later on. 


Subunit d sits on top of the c-ring, linking the V; and Vo parts of the 
central rotor. As seen previously in an ~18-A resolution map of the Vo 
complex’, subunit d sits deeper within the c-ring in the Vo complex than 
it does in the intact V-ATPase. In contrast to the intact V-ATPase struc- 
ture, in the Vo complex the N-terminal domain of subunit a from the 
stator part of the enzyme is in contact with subunit d from the rotor!*”. 
This interaction would prevent rotation of the c-ring, consistent 
with the observation that the Vo complex is impermeable to protons 
when separated from the V, region®’. However, the Vo complex 
remains impermeable to protons even when subunit d or the N-terminal 
domain of subunit a are removed*!. Furthermore, we also identified 
a 3D class that lacks subunit d but otherwise appears to have almost 
the same structure as the intact Vo complex (Extended Data Fig. 4). 
Consequently, the movement of the N-terminal domain of subunit a, 
and inhibition of proton translocation, cannot be due to interaction of 
the N-terminal domain of subunit a with subunit d. 

The fold of subunit a is consistent with the fold predicted by lower- 
resolution cryo-EM studies of the intact V-ATPase combined with 
evolutionary covariance analysis°. The subunit contains eight membrane- 
embedded a-helices, with structured loops connecting some of these 
a-helices. The membrane-embedded domain of subunit a starts with a 
pair of short a-helices that do not fully cross the lipid bilayer (Fig. 1b, c, 
a-helices a; and az). Four subsequent transmembrane a-helices (a3 to ag) 
produce a central layer in the subunit structure, which terminates with 
two long and highly-tilted transmembrane a-helices that contact the 
c-ring (a7 and ag) and are characteristic of rotary ATPases'~*. No den- 
sity was detected for the loop between residues 659 and 709, which 
is the region with the least sequence similarity amongst the different 
isoforms of subunit a°. The lack of density for this loop suggests that it 
is mobile in the structure. Another loop (residues 481-523) could also 
not be modelled, although a region of low-resolution density in the 
map (Fig. 1b, semi-transparent grey density) probably corresponds to 
this loop and to parts of the two small membrane-embedded proteins 
in the Vo complex. 

Unlike the c-ring of F-type ATP synthases and bacterial V/A- 
ATPases, the proton-carrying ring of the eukaryotic V-ATPase is 
hetero-oligomeric. The S. cerevisiae V-ATPase cgc’c"-ring has 40 
transmembrane a-helices arranged into an inner ring and outer ring of 
20 a-helices each!" (Fig. 2). Each c, c’, and c” subunit contributes four 
a-helices to the c-ring, two to the inner ring and two to the outer ring. 
The cryo-EM map shows two additional a-helices that pass through 
the cgc’c’-ring (Fig. 2b, purple asterisk). These a-helices can be attri- 
buted to 56 residues at the N terminus of subunit c” (ref. 12), most of 
which are not necessary for proton translocation’? and correspond to 
density previously seen within the c-ring!* and upon re-examination of 
the intact V-ATPase maps!. The first additional «-helix is adjacent to 
subunit ci) and the second is near the centre of the c-ring with a clear 
connection to the rest of subunit c’. Although map resolution is limited 


1Molecular Structure and Function Program, The Hospital for Sick Children, Toronto, Ontario M5G OA4, Canada. “Janelia Research Campus, Howard Hughes Medical Institute, Ashburn, Virginia 
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Subunit e 
Subunit a 


Blo (View from 
8.2 cytoplasm) 


Figure 1 | The intact Vo complex. a, The Vo cryo-EM density map (semi- 
transparent surface, top) shows subunits a (green), c (pink), c’ (magenta), 
c" (purple), d (cyan), e (blue), and f (red-brown). The asterisk indicates the 
contact between subunits d and c”. b, c, An enlarged view and cartoon of 


in this region, it is sufficient to register the sequence of the a-helices 
(Extended Data Fig. 3b). This structure for subunit c” differs from a 
crystal structure of a bacterial hybrid ‘F/V’- ATPase c-ring, in which a 
single additional N-terminal helix lay across the periplasmic surface 
of the ring’*. Subunit c” serves as the main contact between subunit 
dand the c-ring (Fig. 1a, bottom, cyan asterisk), making it critical for 
the transmission of ATP-driven rotation of subunits D and F of the V; 
region to the c-ring of the Vo region. Comparison to the lower-resolution 
cryo-EM maps of the intact V-ATPase! (Extended Data Fig. 5a-c) 
shows that the Vo complex is in the least populated, and probably the 
least stable, of the three rotational states identified for the intact enzyme 
(Extended Data Fig. 5d). Therefore, subunit c” apparently marks the 
position of the c-ring from which the complex can disassemble and 
reassemble’. An unexpected feature of the subunit c” structure is the 
location of its essential proton-carrying Glu residue”. Each c, c’, and 
c” subunit has a single proton-carrying Glu residue (Glu137, 145, and 
108, respectively”), giving 10 proton-carrying residues for the 20 outer 
a-helices of the ring. Notably, the Glu residue of subunit c” is on its 
second ring-forming «-helix, whereas in subunits c and c’ it is found 
on the fourth ring-forming o-helix. This arrangement gives an irregular 
distribution of acidic proton-carrying Glu residues, rather than having 
proton-binding sites on alternating a-helices (Fig. 2b, right, in red). The 
two adjacent negative charges of subunits c” and c) that are in con- 
tact with subunit a may determine the disassembly- and reassembly- 
competent conformations of the enzyme. 

Transmembrane proton pumping is thought to occur via two offset 
half-channels: a cytoplasmic half-channel that protonates Glu resi- 
dues of the c-ring and a luminal half-channel that deprotonates Glu 
residues'®!”. The cytoplasmic half-channel, which leads to Glu108 
of subunit c”, is readily apparent from inspection of the model as a 
deep cavity between the c-ring and the two highly tilted a-helices of 
subunit a (Fig. 3a, b, blue density and arrow). At its opening to the 
cytoplasm, this cavity is almost 20 A across and 10 A wide and is pre- 
dicted to be filled with water'’, providing direct access to Glu108 
for protons from the cytoplasm. The location of this channel, at the 
interface of subunit a and the c-ring, is consistent with biochemical 
experiments!” and with the indentations observed in the detergent 
micelles of lower-resolution cryo-EM density maps of the F-, V/A-, 
and V-type ATPases! >». The position of the cytoplasmic half-channel 
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Bs 
Subunit a (513-559) 


the C-terminal domain of subunit a with subunits e and f. Unfitted density 
is shown as grey mesh. d, An illustration of how the cryo-EM map allows 
atomic models to be built for most of the protein sequence. Scale bars, 

25 A. 


places it in close proximity to the loop of 51 residues between a-helices 
as and ag (659-709), which contains 21 charged residues”. This loop 
may be involved in modulating access to the half-channel®. The 
location of the luminal half-channel is less obvious. Inspection of 
lower-resolution cryo-EM maps suggested the presence of a gap in the 
detergent micelle”!. Consideration of residues conserved across species 
suggested a proton path that involves the first and second membrane- 
embedded a-helices of subunit a° (Fig. 1c, az and a3). If this position 
is the presumed location of the luminal half-channel, its exit is thus 
located near to the region of unmodelled density in the cryo-EM 
map (Fig. 3a, grey mesh). However, a clear path through the protein, 
from the centre of the lipid bilayer to its luminal surface, is not readily 


a 
Cytoplasm _¢ 
oe: 


Figure 2 | Asymmetry of the c-ring. a, A ribbon model of the c-ring (left) 
with a map section (right) shows the two N-terminal a-helices of subunit 
c” in the middle of the c-ring. b, A view of the c-ring model (left) and 
cartoon (right) from the cytoplasm shows the asymmetric distribution of 
mid-membrane Glu residues around the ring (red). The two N-terminal 
a-helices of subunit c” are marked with an asterisk (left). Scale bars, 25 A. 
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Cytoplasm ¥ 


Figure 3 | Cytoplasmic half-channel and subunit a/c-ring interaction. 

a, b, A large cavity (blue density and arrow) is apparent between the c-ring 
and subunit a in the expected position of the cytoplasmic half-channel. 
The cavity is near the unresolved loop composed of residues 659 to 709 


apparent from the model and could not be detected through auto- 
mated identification of potential pore-lining residues” or by search- 
ing for gaps in the protein”. It should be recalled that the dissociated 
Vo region is auto-inhibited and does not allow for proton transloca- 
tion®® and therefore the luminal half-channel may not exist in this 
state. Alternatively, the luminal half-channel may remain closed in 
proton-pumping V-ATPases, only opening transiently during proton 
translocation. Future investigation of the path that protons take from 
the centre of the lipid bilayer to the lumen is therefore warranted. 


a b 


c” engaged with cytoplasmic channel 
(observed) 


Proton to be 


Cytoplasmic 
> half-channel 


released 


Cytoplasmic half- 


Hypothetical cuanniel 


luminal NH 
half-channel H 


c engaged with cytoplasmic channel 
(modelled) 


Glu108 
Arg735 


Subunit a 


y, Subunit c” 


from subunit a (bottom of b, dashed red line). Scale bar, 25 A. ¢, At the 
mid-membrane terminus of the cavity, the essential residues Arg735 from 
subunit a and Glu108 from subunit c” are positioned to interact. 


All rotary ATPase a subunits include an essential Arg residue 
(Arg735 in the S. cerevisiae V-ATPase) that is necessary to couple ATP 
synthesis or hydrolysis to proton translocation”*”®. The conserved Glu 
residues of subunits c, c’, and c” are also essential for proton transport 
in the V-ATPase™. In the auto-inhibited Vo complex, Glu108 of subunit 
c" interacts with Arg735 present on the penultimate a-helix of subunit 
a° (Fig. 3c). This interaction suggests that Glu108 is deprotonated. 
The presumed salt bridge between Arg735 and Glu108 is similar to 
those between Arg-Asp pairs shown to ensure proton selectivity in 


c 


c’ engaged with cytoplasmic channel 
(modelled) 


Figure 4 | Irregularly spaced Glu residues require protonation before 
deprotonation. a, The observed structure (top) with cartoon (bottom) 
illustrates that when Glu108 of subunit c” is aligned with the cytoplasmic 
half-channel (blue arrow), Glu137 of subunit c(;) contacts subunit a (red 
arrow) and could be aligned with a luminal half-channel. b, When Glu137 
of a subunit c is aligned with the cytoplasmic half-channel (blue arrow) 
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Glu137 of the next subunit c is not in contact with subunit a (red arrow). 
c, When Glu145 of subunit c’ is aligned to the cytoplasmic half-channel 
(blue arrow) Glu108 of subunit c” is too far from subunit a to interact (red 
arrow). Scale bar, 25 A. d, The sequence of protonation and deprotonation 
events that occur during c-ring rotation, with each Glu residue given its 
own colour. 
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voltage-gated proton channels”®. However, because rotary ATPases can 
transport different ions and the architecture of rotary ATPases appears 
to be generally conserved, the function of the Arg-Glu pair here is 
probably to ensure that the Glu residue is stripped of its ion before it 
leaves the luminal half-channel, rather than to ensure proton specificity. 

Liquid chromatography-tandem mass spectrometry (LC-MS/MS) 
analysis of in-gel-digested protein suggested that Vma9p (subunit e) 
and putative protein YPR170W-B were present in the purified Vo com- 
plex (Supplementary Table 2 and Supplementary Data). The a-helical 
hairpin shown in blue in Fig. la-c was identified as subunit e because 
its density fit with that of the three Trp residues and the Phe and Tyr 
residues of subunit e (Extended Data Fig. 3e). The structure of sub- 
unit e, and the interaction of its C terminus with a loop in subunit a, 
explains why addition of a C-terminal affinity tag to subunit e causes 
its dissociation from the detergent-solubilized complex’. Its position 
distal to the area of interaction between subunit a and the c-ring is 
consistent with the observation that its removal from the assembled 
V-ATPase does not affect proton-pumping activity”’. Deletion of the 
VMA9Q gene leads to the conditionally lethal VMA phenotype”® and 
the absence of V-ATPase from the yeast vacuole, suggesting that sub- 
unit e is necessary for the successful localization and assembly of the 
complex”. To confirm the presence of putative protein YPR170W-B as 
the additional membrane-embedded component of the Vo complex, 
we constructed a S. cerevisae strain with a 3 x FLAG-tag fused to the 
C terminus of YPR170W-B. Affinity purification from the detergent- 
solubilized membranes of this strain isolated the Vo complex with some 
subunits from the intact V-ATPase (Extended Data Fig. 6a), confirming 
the protein as a component of the Vo complex. We subsequently puri- 
fied the Vo complex from a strain of yeast lacking YPR170W-B and 
determined its structure by cryo-EM (Extended Data Fig. 6b). The 
missing density in the 3D map allowed us to unambiguously locate 
YPR170W-B in the complex. Deletion of YPR170W-B, which we now 
tentatively identify as subunit f of the V-ATPase, did not produce the 
VMA phenotype” (Extended Data Fig. 6c), indicating that it is not 
essential for V-ATPase localization or for proton pumping. The periph- 
eral location of the protein, away from the interface of subunit a and the 
c-ring, leaves its function in the V-ATPase complex unclear. A BLAST 
search indicates that YPR170W-B is highly conserved in fungi. Reliable 
identification of possible animal or plant homologues of subunit f will 
require experiments with other organisms. The cryo-EM density for 
YPR170W-B (red-brown in Fig. 1) had insufficient resolution to build 
an atomic model; consequently this component of the complex was 
modelled as poly-alanine. 

The asymmetric distribution of Glu residues around the c-ring, and 
how the ring interacts with subunit a, has implications for possible 
modes of proton translocation. The spacing between Glu residues in 
the c-ring has an increment of 36° between most Glu residues, but 18° 
between the Glu residues of subunits c” and c,1) (Fig. 2b, purple and light 
pink), and 54° between the Glu residues of subunits c’ and c” (Fig. 2b, 
magenta and purple). As seen in this model, subunit c” is positioned 
at the protonating cytoplasmic half-channel (Fig. 4a, blue arrow). In 
this orientation, subunit c,1) is in contact with subunit a at a possible 
mid-membrane origin for the luminal half-channel (Fig. 4a, red arrow). 
However, if the ring were rotated such that one of the c subunits (for 
example, subunit c(;)) engages with the cytoplasmic half-channel 
(Fig. 4b, blue arrow) then the next c subunit (subunit c,g) in this example) 
would not be in contact with subunit a, making it unlikely that it is 
near a luminal half-channel (Fig. 4b, red arrow). Misalignment with 
any possible luminal half-channel is even more pronounced if subunit 
c’ is engaged with the cytoplasmic half-channel (Fig. 4c, blue arrow), in 
which case the Glu from subunit c” is extremely far from contact with 
subunit a (Fig. 4c, red arrow). This mismatch ensures that protonation 
and deprotonation cannot always occur simultaneously. Instead, the 
sequence of events for ATP-hydrolysis-driven proton pumping must be 
that rotation of the c-ring, caused by ATP hydrolysis in the V; region, 
breaks the contact between Arg735 and a deprotonated Glu residue, 


LETTER 


dragging the Glu residue into the hydrophobic environment of the lipid 
bilayer (Fig. 4d). Thermodynamic analysis has shown that moving a 
deprotonated Glu residue from an aqueous environment into the middle 
of a hydrophobic lipid bilayer will force the residue to acquire a proton 
and become neutralized*”. This thermodynamic property of Glu resi- 
dues ensures that the ring does not ‘slip, allowing a deprotonated Glu 
to move into the lipid bilayer, which would uncouple ATP hydrolysis 
from proton pumping. Rotation of the c-ring brings a protonated Glu 
residue out of the lipid bilayer and into alignment with the putative 
luminal half-channel. As rotation of the ring continues, the Glu residue 
encounters Arg735, possibly in a different orientation than observed in 
this structure, causing deprotonation of the Glu residue via the luminal 
half-channel and resetting the c-ring for a subsequent protonation— 
deprotonation cycle. The conformation of Arg735 may also need to 
adapt to accommodate interaction with the Glu residue of subunit c”, 
compared to subunit c or c’. This sequence of events, in which only 
one Glu residue at a time interacts with a half-channel, explains how 
rotary ATPases can tolerate c-rings with variable”! and unequal dis- 
tances between the c-subunits. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 

Yeast strains and protein purification. To isolate the S. cerevisiae Vo complex, 
yeast strain CACY 1, expressing Vph1p (subunit a) with a 3 x FLAG-tag at its C 
terminus, was prepared by homologous recombination as described previously’. 
To confirm that YPR170W-B is a subunit of the V-ATPase, an S. cerevisiae strain 
was prepared with a 3 x FLAG-tag fused to the C terminus of the uncharacterized 
open reading frame YPR170W-B (Saccharomyces genome database identifier 
$000028515) in the protease-deficient background strain MM93, producing the 
strain SABY87. To purify intact V-ATPase, we used strain SABY31, which bears 
a 3x FLAG-tag on Vmalp and with the STV1 gene deleted. YPR170W-B was 
deleted in strain CACY1 by homologous recombination with the NatR cassette 
to produce strain MMJY1 as previously described’. Western blotting against 
3 x FLAG-tagged proteins was done with the monoclonal anti-FLAG antibody 
M2 (Sigma). For protein purification, yeast in YPD medium were grown in an 
11-1 BioFlo fermenter (New Brunswick Scientific) for 60 h in order to induce nutrient 
starvation and dissociation of the V, and Vo regions of the complex. Vo com- 
plex solubilized with dodecylmaltoside (DDM) was purified via the 3 x FLAG-tag 
with M2 affinity matrix (Sigma), following the same protocol used for the intact 
V-ATPase*”. After purification in elution buffer (50 mM Tris-HCl pH 7.5, 150mM 
NaCl, 0.02% (w/v) DDM, 50,1g/ml 3 x FLAG peptide), the Vo complex was mixed 
with amphipol A8-35 (Anatrace) at a protein:amphipol ratio of 1:10 (w/w) with 
gentle agitation for 1h. Detergent was removed with 15 mg/ml Bio-Beads SM-2 
(Bio-rad) at 4°C overnight. The sample was purified further with a Superdex 200 
column previously equilibrated with buffer (50mM Tris-HCl pH 7.5, 150 mM 
NaC]). Protein from the chromatographic peak corresponding to the Vo complex 
was collected and concentrated to 2.5 mg/ml with a 100-kDa MWCO Amicon 
concentrator (Millipore) and bafilomycin Al was added to 101M (Santa Cruz 
Biotechnology) before further analysis. 

LC-MS/MS and database search. Subunits of the Vo preparation were separated 
by SDS-PAGE and regions of the gel were excised at positions where small trans- 
membrane a-helical hairpin subunits are expected. Proteins were digested in the 
gel as described previously”! with trypsin and chymotrypsin at 37°C and 25°C, 
respectively. For combined tryptic and chymotryptic digestion, proteins were 
digested with trypsin at 37°C for 3h before chymotrypsin was added and the 
sample was incubated at 25°C overnight. Peptides dissolved in 2% (v/v) acetonitrile 
and 0.1% (w/v) formic acid were separated by nano-flow liquid chromatography 
(Dionex UltiMate 3000 RSLC, Thermo scientific; mobile phase A: 0.1% (v/v) formic 
acid; mobile phase B: 80% (v/v) acetonitrile, 0.08% (v/v) formic acid). Peptides 
were then loaded onto a trap column (Reprosil C18, 100 1m inner diameter, parti- 
cle size 54m; Dr. Maisch GmbH, prepared in-house) and separated with a flow rate 
of 300 nl/min on an analytical C18 capillary column (Reprosil C18, 75|1m inner 
diameter, particle size 1.9 1m, 27-28 cm; Dr. Maisch GmbH, prepared in-house), 
with a gradient of 5-90% (v/v) mobile phase B over 46 min. Separated peptides 
were directly eluted into a Orbitrap Fusion Tribrid Mass Spectrometer (Thermo 
scientific). Typical mass spectrometric conditions were: spray voltage of 2.3 kV; 
capillary temperature of 275°C; collision energy of 30%, activation Q of 0.25. The 
Orbitrap Fusion Tribrid Mass Spectrometer was operated in data-dependent mode. 
Survey full-scan MS spectra were acquired in the orbitrap (m/z 380—1,500) with a 
resolution of 120,000 and an automatic gain control (AGC) target at 400,000. The 
top-10 most intense ions were selected for higher-energy collisional dissociation 
MS/MS fragmentation in the orbitrap at a resolution of 30,000 and an AGC target 
of 1,200 and with a first m/z of 110. Dynamic exclusion of previously selected ions 
was set to 30s. Only ions with charge states 2-7 were selected. Singly and doubly 
charged ions, as well as ions with an unrecognized charge state, were also excluded. 
Internal calibration of the orbitrap was performed with the lock mass option (lock 
mass, m/z 445.120025)**. Raw files were converted into megf files using Proteome 
Discoverer (Thermo scientific). Mgf files were searched against Uniprot_Yeast 
database (23,481 sequences) using Mascot search engine v2.03.2002 (Matrix 
Science). Search parameters were: peptide mass tolerance, 10 p.p.m.; fragment mass 
tolerance, 0.6 Da; enzyme, trypsin; variable modifications, carbamidomethylation 
(cysteine) and oxidation (methionine). 

Cryo-EM and image analysis. Vo complex (2.511) in amphipol was applied to nano- 
fabricated, holey-gold-coated EM grids** previously glow-discharged in air for 15s, 
blotted for 15s, and then plunge-frozen with a modified Vitrobot grid-preparation 
device (FEI Company) in a mixture of liquid ethane and propane at liquid-nitrogen 
temperature**. Cryo-EM grid preparation conditions were optimized and a small 
dataset consisting of 1,082 images obtained with a field-emission Tecnai F20 electron 
microscope (FEI Company) operating at 200 kV, with images recorded on a 
Gatan K2 Summit (Gatan Inc.) direct detector device camera (counting mode, 
2 frames/s, 15, 1.45 Alpixel, 1.2 e -/A?/frame). An initial model for the Vo com- 
plex was generated by manually segmenting a map of the intact V-ATPase! with 
UCSF Chimera*? and low-pass filtering to 30 A. Image analysis with Alignframes_ 
Imbfgs*°, CTFFIND3 (ref. 37). Alignparts_Imbfgs*°, magnification anisotropy 
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correction*® and Relion 1.3 (ref. 39) produced a preliminary map at 6.8-A 
resolution from 39,384 particle images. The structure of the Vo complex in 
DDM was also determined from yeast strains CACY 1 (wild type) and MMJY1 
(AYPR170W-B). These specimens were prepared with nanofabricated holey- 
carbon-coated EM grids subjected to glow-discharge for 2 min and blotted for 
20s. 3D maps were calculated at 8.3-A resolution from 43,184 particle images 
(for CACY1) and at 8.7-A resolution from 44,468 particle images (for MMJY1). 

For high-resolution image acquisition, grids were sent in a cryogenic specimen 
shipper to the HHMI Janelia Research Campus where they were imaged with a 
Titan Krios electron microscope (FEI Company). Micrographs were recorded from 
a single grid with the microscope operated at 300kV using parallel illumination 
at 4.8e~/A?/s of a 1.58-j1m diameter region of the grid from a 70-|1m objective 
aperture. Images were recorded with a K2 Summit direct detector device camera 
operating in super-resolution mode with a nominal magnification of 37,000 x. 
With no specimen present in the optical path, the rate of exposure of the detector 
was 3e /pixel/s. Dose-fractionated exposures of 21s were recorded as movies with 
70 frames, so that selected specimen areas were exposed to a total of 100e /A”. 
Data collection was automated with SerialEM*°. A previously measured magni- 
fication anisotropy was corrected with the program mag_distortion_correct*!, 
leading to a super-resolution pixel size of 0.3885 A. Frames were down-sampled 
to a pixel size of 1.554 A by Fourier-space cropping and aligned with each other 
using Unblur®’. Probably because of the high magnification and small field of view, 
no apparent advantage was detected for correcting images for individual particle 
motion**. Defocus parameters were estimated with CTFFIND4 (ref. 43) from the 
average of amplitude spectra of sums of 3 unaligned frames resampled to a pixel 
size of 1.94 A. 657,975 particle images were automatically selected with Relion from 
4,365 aligned and averaged movies, and extracted into 200 x 200 pixel boxes. 2D 
classification with Relion reduced the dataset to 462,842 particle images. 

The map showing the Vo complex lacking subunit d was obtained by 3D classi- 

fication and refinement with Relion (ref. 39). The 6.8-A resolution map of the Vo 
complex in amphipol from the Tecnai F20 dataset was used as a starting reference 
for aligning the full dataset of particle images with Frealign“. In all subsequent 
processing steps, information beyond 6-A resolution was excluded to prevent 
over-fitting of noise at higher resolutions. Initial particle orientation parameters 
were obtained by 1 cycle of grid search in Frealign mode 3, using an angular step 
size of 3.4°. Following this cycle, and before every subsequent cycle, the solvent and 
amphipol regions of the map were low-pass-filtered to 30 A to reduce the fitting of 
noise from solvent and amphipol densities*®. Final particle-orientation parameters 
were obtained after 19 cycles of local refinement (mode 1), one cycle of mode 
3 exhaustive search, and 11 further cycles of local refinement. The final 3D map was 
calculated from particle images padded with zeroes from 200 x 200 to 400 x 400 
pixels to mitigate CTF aliasing effects*° and was then cropped to a 200 x 200 x 200 
voxel volume. Overall resolution of the map was estimated by Fourier shell correla- 
tion to be 3.9 A. Local resolution variability was estimated with blocres’’. 
Model building. Most regions of the map, particularly membrane-embedded 
a-helices, had sufficient details to allow de novo model building. Initial models for 
subunits c, c’, c’, d, and the N-terminal domain of subunit a were generated with 
the SWISS-MODEL server*® using a model of intact V-ATPase from cryo-EM 
(Protein Databank accession number 3J9V) as a template. The initial model for 
the membrane-embedded domain of subunit a was from the cryo-EM and evolu- 
tionary covariance analysis of that protein (Protein Databank accession number 
511M)”. Initial models for subunits e and f were generated manually in Coot”. 
Initial models were fit into density as rigid bodies with UCSF Chimera”. Final 
models were built with successive rounds of real-space refinement in Phenix”? 
and manual model building in Coot and gave an EMRinger score of 2.0 for the 
entire model*!, which is superior to the typical score of 1.0 for a 4-A map. 93.2, 
6.3, and 0.5% of residues were in preferred, allowed, and disallowed regions of the 
Ramachandran plot, respectively, with no Ramachandran outliers in the a-helical 
regions of the model. Where no corresponding densities were observed, side chains 
were deleted while maintaining the residue identity. Subunit f and the N-terminal 
domain of subunit a were modelled entirely as poly-alanine. 
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Extended Data Figure 1 | Subunit composition of the intact V-ATPase and dissociated V, and Vo regions. The rotor is outlined in black and the two 


half-channels in the Vo region are indicated with dashed lines. The intact V-ATPase (left) dissociates into the auto-inhibited V; and Vo complexes upon 
nutrient starvation. Figure adapted from ref. 1. 
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Extended Data Figure 2 | Cryo-EM map generation. a, An example of the map at FSC = 0.143 (3.9 A) are indicated. c, Local resolution 
micrograph with protein particles circled in red. Scale bar, 500 A. assessment. Scale bar, 25 A. d, Image orientation distribution. e, Example 
b, Fourier shell correlation (FSC) curve. The highest-resolution 2D class average images. 


information used in image alignment (6 A) and the overall resolution 
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Extended Data Figure 3 | Model building. a-f, Example regions of the atomic model built for subunits a (a), c” and c’ (b), c(1) (c), d (d), e (e), and f (f). 
g, The different a-helices from the c-ring bearing conserved Glu residues show variable resolution. An a-helix from the N-terminal domain of subunit a 
has poor resolution. Residue numbers are shown in brackets. 
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Extended Data Figure 4 | Vo complex lacking subunit d. a, The Vo complex map from all of the particle images shows subunit d. b, Vo complex map 
from a 3D class, containing 24,744 particle images, that lacks subunit d was determined at 7.8-A resolution. Scale bar, 25 A. 
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Extended Data Figure 5 | Vo complex is in rotational state 3. a—c, rotational states 1, 2, and 3 of the intact V-ATPase show the two « helices of subunit 
c" within the c-ring’. d, The two a-helices of subunit c” within the c-ring show the ring to be in the same orientation as in rotational state 3 of the intact 
V-ATPase. Scale bar, 25 A. 
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Extended Data Figure 6 | Identification of subunit f. a, SDS-PAGE yeast strain with the YPR170W-B gene deleted (right). Density from 
gel (left) and western blot (right) against a 3x FLAG-tag for the affinity YPR170W-B is indicated with a red arrow. Scale bar, 25 A. c, Yeast strains 
purification of 3x FLAG-tagged YPR170W-B (subunit f) and Vmalp with the STV1 and VPH1 genes deleted, the STV1 and YPR170W-B gene 
(subunit A) show that both proteins are components of the V-ATPase. deleted, and only STV1 gene deleted were grown on both YPD medium 
b, Surface-rendered 3D maps (upper) and map cross-sections (lower) (left) and YPD medium with zinc (right), demonstrating that deletion of 
showing the wild-type Vo complex (left) and the Vo complex from a YPR170W-B does not cause the VMA phenotype. 
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CORRECTIONS & AMENDMENTS 


CORRIGENDUM 
doi:10.1038/nature19335 


Corrigendum: Lytic to temperate 


switching of viral communities 


B. Knowles, C. B. Silveira, B. A. Bailey, K. Barott, V. A. Cantu, 
A. G. Cobidn-Gtiemes, F. H. Coutinho, E. A. Dinsdale, B. Felts, 
K. A. Furby, E. E. George, K. T. Green, G. B. Gregoracci, 

A. F. Haas, J. M. Haggerty, E. R. Hester, N. Hisakawa, 

L. W. Kelly, Y. W. Lim, M. Little, A. Luque, T. McDole-Somera, 
K. McNair, L. S. de Oliveira, S. D. Quistad, N. L. Robinett, 

E. Sala, P. Salamon, S. E. Sanchez, S. Sandin, G. G. Z. Silva, 

J. Smith, C. Sullivan, C. Thompson, M. J. A. Vermeij, M. Youle, 
C. Young, B. Zgliczynski, R. Brainard, R. A. Edwards, J. Nulton, 
F. Thompson & F. Rohwer 


Nature 531, 466-470 (2016); doi:10.1038/naturel 7193 


In this Article, the ‘Predator—-prey modelling’ section of the Methods 
shows Lotka-Volterra equations. Although these equations are meant 
to present a basic Lotka-Volterra model, the term ‘N/K in the second 
equation was inadvertently introduced during proofing, which makes 
the equations reflect the Piggyback-the-Winner model rather than 
basic Lotka-Volterra. The equation should have read: 


6V /6t=(38->-N-V)—(m-V) 
rather than: 
6V /6t=(G3-0-N/K-N-V)—(m-V) 


This change does not affect the conclusions of the paper, and has been 
corrected online. 
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CORRECTIONS & AMENDMENTS 


CORRIGENDUM 
doi:10.1038/nature19837 


Corrigendum: Noncanonical 
autophagy inhibits the 
autoinflammatory, lupus-like 
response to dying cells 


Jennifer Martinez, Larissa D. Cunha, Sunmin Park, Mao Yang, 
Qun Lu, Robert Orchard, Quan-Zhen Li, Mei Yan, Laura Janke, 
Cliff Guy, Andreas Linkermann, Herbert W. Virgin & 

Douglas R. Green 


Nature 533, 115-119 (2016); doi:10.1038/nature17950 


In Fig. 2a of this Letter, during the preparation of the final figures, the 
Cre~ Atg5"! representative image was inadvertently duplicated in lieu of 
the Nox2*'* representative image. In Extended Data Fig. 2d, the Cret 
ATG7"" representative image for Ki67 immunohistochemical staining 
was inadvertently duplicated in lieu of the NOX2~'~ representative 
image. We sincerely apologize for these errors. The corrected image 
for Fig. 2a is shown in Fig. 1 to this Corrigendum, and the corrected 
Extended Data Fig. 9 is shown in the Supplementary Information. 
Quantifications in Fig. 2b are unaffected by this error. 


Supplementary Information is available in the online version of the 
Corrigendum. 


a Atg7"* Atg5‘ Becn1"# Fip200 UIk1 Nox2 Ruben 


Figure 1 | This shows the corrected Fig. 2a from the original Letter. 
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ILLUSTRATION BY THE PROJECT TWINS. 


MINING THE SECRETS OF 
COLLEGE SYLLABUSES 


The creators of the Open Syllabus Project hope that sharing data 
can bothimprove and reward teaching. 


BY ANNA NOWOGRODZKI 


espite a growing movement to glean 
D insights from scholarly materials that 
are available online — from articles 
and data sets to conference presentations and 
lectures — one kind of academic document 
remains little examined. And that is the syllabus: 
a document that lays out the reading materials, 
topics and expectations of college courses. 
That, at least, was the case until January 
this year, when data scientists, sociologists 
and digital-humanities researchers at Colum- 
bia University in New York City launched a 
tool called the Open Syllabus Explorer. This 
integrates more than 1 million publicly avail- 
able syllabuses and lays open their data in 
a conveniently searchable format. A version 
containing three times as many syllabuses 
is scheduled to launch in January 2017. 


The team behind the tool, the Open 
Syllabus Project (OSP), hope to nudge univer- 
sities towards making more syllabuses public. 
They argue that doing so could aid textbook 
authors, instructors and course developers, 
and would reward the design of effective teach- 
ing materials, which is largely overlooked by 
conventional measures of academic effort. 

“Syllabuses are among the most impor- 
tant documents written by scholars which 
are not yet widely shared, and they ought to 
be,” says Peter Suber, director of the Harvard 
Open Access Project and the Harvard Office 
for Scholarly Communication in Cambridge, 
Massachusetts, who serves on the OSP advi- 
sory board. “They reflect serious scholarly 
judgements about what's worth teaching.” 

Such judgements can be welcome news 
to textbook authors. Stuart Russell, a com- 
puter scientist at the University of California, 


Berkeley, didn't realize until Nature interviewed 
him for this article that his 1995 book Artificial 
Intelligence (Prentice Hall), co-authored with 
Peter Norvig, was the most highly assigned 
text in the field of computer science. “I was 
definitely surprised,” he says. 

Beyond stoking professional pride, such 
information could strengthen tenure and pro- 
motion packages. Authoring a textbook, no 
matter how useful and informative it might 
be, generally yields few citations in scholarly 
papers, so its academic impactis likely to be low. 
The OSP could help to shift the balance. “We're 
at a point in time when I think faculty have to 
take more ownership of their whole record of 
scholarship, of impact, of influence,” says Amy 
Brand, director of the MIT Press. Hard data on 
syllabus usage, she says, could empower faculty 
members “to tell their own story about what 
their work is doing in the world”. 
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>» At present, Open Syllabus Explorer 
searches more than 1 million syllabuses dating 
back to 2000, cross-referenced with 20 million 
texts, to produce data on how often a text is 
taught. Users can search those data by author, 
title, institution and academic discipline. The 
tool also reports which textbooks are com- 
monly used together, and ranks each text on 
how frequently it is taught (see “Top texts’). 

An updated version, due to become avail- 
able on 21 January 2017, the Explorer’s first 
anniversary, will feature 3 million syllabuses 
cross-referenced with about 150 million texts; 
these will include titles from the arXiv preprint 
server, CrossRef and the Virtual International 
Authority File — which links together identical 
bibliographic records from different national- 
library catalogues. The update will include new 
search options, such as the ability to search by 
date or type of institution, says Joe Karaganis, 
the OSP’s project director. The new version will 
also incorporate better Canadian and UK data, 
information about where to find materials and, 
eventually, full-text syllabuses, if the authors 
have given permission to reproduce them. 

“We have some big ambitions,’ Karaganis 
says. “All the techniques are very crude at pre- 
sent but they’re all improvable, and the data 
science is only getting better” 


FISHING FOR CITATIONS 
The OSP is based at the American Assembly, 
a public-policy institute at Columbia Univer- 
sity, and is funded by the Sloan Foundation and 
the Arcadia Fund. It was inspired by a search 
engine called Syllabus Finder, which scraped the 
public web for syllabuses from 2002 (the year it 
was built) until 2009. That tool was created by 
Dan Cohen, then a historian at George Mason 
University in Fairfax, Virginia, who is now exec- 
utive director of the Digital Public Library of 
America. It amassed what Cohen says was then 
the largest collection of syllabuses ever assem- 
bled, comprising about 1 million documents. 
He released the URLs as a database in 2011. 
Unlike the OSP, Cohen's tool provided links 
to the full text of each syllabus. But it included 
only courses run up to 2009, when he had to 
retire the tool because of changes to Google's 
programming interface — a move that vexed 
Cohen's colleagues, including his wife, an early- 
childhood educator. “T still get e-mails begging 
me to turn the Syllabus Finder back on,” he says. 
When the OSP began in 2014, the team 
built tools to scrape the public Internet — 
including the links used by Cohen, who had 
lost a portion of the data owing to a coding 
error. But, as Cohen was, the team is limited 
to publicly accessible syllabuses: about 6 mil- 
lion of an estimated 80 million to 120 million 
syllabuses in the United States alone, by 
Karaganis’s reckoning. Syllabuses sealed 
behind the walls of private course-manage- 
ment software, such as Blackboard, remain 
out of reach. “Columbia, for example, is 
sitting on 80,000 syllabuses from the last 


TOP TEXTS 

The 5 most-taught scientific texts according to OSP. 
Textbook (author) Syllabuses 
Biology: Concepts and Connections 2,196 

(N. A. Campbell et al.) 

Fundamentals of Anatomy and 752 
Physiology (F. Martini et al.) 

Chemistry (R. Chang) 612 
Human Anatomy & Physiology 605 

(E. N. Marieb and K. Hoehn) 

Human Anatomy (E. N. Marieb etal.) | 591 


Data filtered by fields: Astronomy and Astrophysics, Biology, 
Chemistry, Computer Science, Earth Sciences, Engineering, 
Psychology, Sociology. 


12 or 13 years,” says Karaganis. “A large state 
school could have two, three times that.” 
The OSP team then had to build tools to 
extract what those syllabuses contained. Cita- 
tions, for instance, had no consistent structure, 
says David McClure, the project's technical 
director. The tool searched for titles by cross- 
referencing each syllabus against a database of 
20 million titles — 11 million from Harvard 
LibraryCloud and 9 million from JSTOR. A 
matching title and author counted as a cita- 
tion. “We built in different techniques for 
allowing fuzziness, like allowing the word ‘by’ 
in between the author and title,” says McClure. 


ANEW METRIC 

The OSP distils those data down to a single 
metric called the teaching score, which indi- 
cates how often a text is assigned in syllabuses. 
It can take any value from 1 (rarely taught) to 
100 (frequently taught). 

According to Suber, teaching scores are an 
alternative to conventional metrics of scholarly 
impact. They reflect the burgeoning ‘alternative 
metrics ethos, which aims to quantify the whole 
ofa person's research output. “I think this teach- 
ing score can take part in the new alt-metrics 
movement and give us a more sensitive meas- 
urement of the impact of texts.” he says. 

Already, a handful of researchers and 
universities are using the data to do just that. 
The University of Kentucky in Lexington 
issued a press release when it discovered that 
a paper by Edward Morris, one of its faculty 
members, ranked 46 out of 13,225 sociology- 
related texts. It now ranks 371 out of 53,177, 
and Morris plans to use the figure to support 
his promotion to full professor. 

US universities aren't the only ones paying 
attention. Most of the roughly 1,000 visits to 
the OSP each day are from the United States, 
says Karaganis, but significant traffic comes 
from Ukraine, Russia and Egypt as well. 

Other researchers have used the data to 
compile lists of widely taught graphic novels 
and comics, for instance, or to quantify the 
fraction of frequently taught sociology texts 
authored by women. Melanie Martin, a post- 
doc at Yale University in New Haven, Con- 
necticut, used the Syllabus Explorer to identify 
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the most commonly taught texts in her field, 
evolutionary anthropology. But, because 
there is no way to search the database by 
subfield — for instance, limiting biology 
results to such subdisciplines as neurosci- 
ence or genomics — she had to scan the 
16,000 anthropology titles manually. “Without 
better filtering, I think it’s limited,” she says. 


BUILDING ON PEER EXPERTISE 

Another possible application of OSP data 
involves course design. By enabling faculty 
members — particularly junior ones — to 
build on the knowledge of their peers, the OSP 
could help them to teach more creatively, such 
as by identifying new ways to present teaching 
material. “This could go a long way to improv- 
ing the quality of instruction,” says Russell. It 
would also improve efficiency, leaving faculty 
members more time for other activities such as 
research and mentoring. 

However, it is important not to over-interpret 
the data, says Lisa Janicke Hinchliffe, a special- 
ist in information literacy at the University of 
Illinois at Urbana—Champaign. The project's 
sample set might not be a good proxy for all 
syllabuses, even at a particular institution. For 
instance, the second most-assigned text at Har- 
vard, according to the Explorer, is ‘Letter from 
Birmingham Jail’ by Martin Luther King Jr. 
But about 80% of the OSP’s Harvard syllabuses 
come from the John F. Kennedy School of Gov- 
ernment, Karaganis says (although the OSP 
doesn't publicly list its sources in this much 
detail). So it’s not possible to conclude how 
popular this text is at Harvard overall. 

For Hinchliffe, the value of the OSP lies in 
its ability to reveal the breadth of resources 
that instructors use. “I don't need a definitive 
“These are the top-six taught books,” she says. 
“T want to see the variety.” 

Such information could go a long way 
towards simplifying course design, a notori- 
ously time-consuming process. Just ask Suber, 
who has been teaching philosophy for 21 years. 
“Whenever I knew a new course was coming, I 
would try to start preparing it at least a year in 
advance,” he says. “Writing 40 lectures is a huge 
job; it’s harder than writing a book” 

The OSP’s data could ease that burden. Plus, 
says Suber, the data are fun to explore, some- 
times revealing unexpected pairings. His legal 
philosophy text, The Case of the Speluncean 
Explorers (Routledge, 1998), for instance, has 
been taught alongside Sappho’ lyric poetry. 
“There are partners or juxtapositions that I 
never would have guessed,” he says. = 


CORRECTION 

The Toolbox article ‘Democratic databases: 
science on GitHub’ (Nature 538, 127-128; 
2016) misstated how the Git software 
records changes in files. It does in fact 
maintain multiple versions of the files. 


SOURCE: OSP 
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POSTGRADUATE STUDIES 


Find the best fit 


Choices for doctoral programmes can seem endless, so look 
for one that matches your interests and personality. 


BY KENDALL POWELL 


here’s much to consider when youre 

| trying to choose the university and 
programme for your science PhD. But 

the main reason for your selection must be 
that it suits you — not that you don't know 
what else to do, not the institution's or depart- 
ment’s reputation, not that a star researcher 
in your field is a faculty member there. 
Getting a PhD is hard enough, says Bruce 
Horazdovsky, associate dean for the Mayo 
Graduate School in Rochester, Minnesota. 
You don’t want to make it harder by being 


“miserable while you're doing it”, he says. 
“You have to be engaged and like what you are 
doing. The best programme in the country is 
the one that best fits you.” 

How do you find that best fit? Prospective 
doctoral students will need to consider 
several factors and compare programmes 
and schools. Deciding which universities to 
apply to means identifying programmes that 
match your research interests and personal- 
ity. You will need to evaluate how the school 
approaches career and professional devel- 
opment for its graduate students, and how 
its alumni fare after achieving their PhDs. 


Ultimately, the school you select will be the 
launch pad for your scientific career. 

Before you look at schools, you should 
have a clear idea of your chosen subfield of 
study. “Even at this stage, students ought to 
be thinking about what sort of specialization 
they want to do,” says David Bogle, pro-vice- 
provost of the doctoral school at University 
College London. He notes that a physics 
programme, for instance, could be great for 
astrophysics and string-theory research but 
offer nothing on materials science. 

Although it’s not necessary to narrow down 
fields too specifically, it is imperative to find 
a programme that has at least several faculty 
members who are doing research that excites 
you, says Bogle, who chairs the League of 
European Research Universities’ doctoral 
studies community in Leuven, Belgium. 
He advises students to look, not for a single 
high-profile researcher, but rather for a strong 
research environment with several professors 
working in similar areas. 

To get started, applicants can generally 
find descriptions of a school’s research pro- 
grammes and faculty members on the institu- 
tion’s website. Sometimes, more information 
is available: the European School of Molecu- 
lar Medicine (SEMM), a graduate programme 
shared between two universities and three 
research centres in Milan and Naples, Italy, 
publishes an annual list of faculty members 
who are taking new students in the coming 
year. Other institutions may publish similar 
material. 

Group websites can also give applicants a 
feel for the size and culture of a laboratory. 
Look for photos of lab outings or celebrations, 
for announcements of student achievements 
and publications, and for other evidence that 
graduate students drive much of the research 
in the group. Applicants should also look upa 
lab group’s latest research publications to get 
an idea of its members’ current interests and 
to see how well and how often students in the 
lab are publishing papers. “If the publications 
coming out of a lab are numerous and high 
quality, you can be pretty sure that you will get 
published by the end of your PhD” — which 
is essential for success after graduation, says 
Francesca Fiore, coordinator of the SEMM 
graduate office in Milan. 

Applicants should also seek advice and 
guidance from their undergraduate or master’s 
advisers to generate a shortlist of potential 
programmes. “Come talk to me,’ says Andreas 
Berlind, an astrophysicist at Vanderbilt > 
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> University in Nashville, Tennessee. “Let me 
help you make that initial list — it will save 
you a lot of time.” Advisers, he says, have 
enough deep knowledge of their field and 
its subfields to know which programmes tie 
in with the subjects a student is passionate 
about; they should also know where other 
researchers in that subfield are doing good 
work. For extra connections with the research 
world, applicants should try to attend large 
scientific conferences in their subfield; 
these often have travel fellowships so that 
undergraduates can attend. 

Students should also check online resources 
such as the US National Research Mentoring 
Network (nrmnet.net) and Facebook sites 
such as Equity Einstein, a group dedicated to 
making physics and astronomy more inclu- 
sive. Such resources will help students to 
connect with established researchers who 
can offer advice on training. They should also 
contact current graduate students in their sub- 
field to learn about programmes reputations. 
Applicants should not be shy about doing 
this, says Berlind, who is also Vanderbilt’s 
director of graduate studies in astrophysics. 
It is the best way to get honest answers about 
the culture and atmosphere in a programme, 
he adds (see “The value of hindsight’). Fiore 
also encourages correspondence with current 
students, especially for applicants who are 
pondering studying abroad. “Find someone 
from your home country,’ if possible, she 
says, so that you can discuss their experience 
in your native language. 

Prospective students should never pin 
their hopes on working with one particular 
professor, because that person may not be 
taking students, may move away or might 


bea terrible fit as a mentor. If several faculty 
members are working in a similar area, the 
student has a better chance of landing a spot 
in one of those labs. Identify and contact 
at least two researchers, and ideally more, 
whom youd like to do a PhD with, counsels 
Pamela McLean, director of neurobiology 
at Mayo Graduate School in Jacksonville, 
Florida. When you e-mail them, you can let 
them know of your interest in their work and 
find out whether they are taking on doctoral 
students in the next year. “A lot of times it will 
also strengthen your application,’ she says. 
“Those names are often forwarded to admis- 
sions directors, and someone who has taken 
the initiative gets bonus points.” 


SHOW ME THE DATA 

More programmes are publishing data on 
their websites about their graduate students, 
including the average time taken to achieve 
a PhD. Students should pay particular atten- 
tion to this: anything much more than five 
years for US programmes or three for UK 
programmes can indicate that students 
are languishing in labs as labourers. Some 
institutions provide data on their graduates’ 
career choices — the University of Califor- 
nia, San Francisco, posts outcome data for 
most of its graduate-division programmes 
(go.nature.com/2dnyy89). It’s unusual for 
these data to be long-term enough to give a 
realistic picture of what all PhD holders are 
doing ten years after earning their degree, but 
it is still useful to scan such listings to see if 
doctoral graduates are ending up in careers 
that applicants consider desirable. “If they're 
not there, that’s a bad sign that the depart- 
ment doesn't see it as a priority to advertise 


THE VALUE OF HINDSIGHT 


how well students are doing,” says Berlind. 

Applicants should also determine whether 
they want to work on fundamental questions 
or do applied research. Students interested in 
the latter should seek programmes with strong 
ties to high-tech companies, the aerospace 
industry or hospitals, if their passion lies in 
those areas. For example, the Mayo Gradu- 
ate School is spread across three large medical 
campuses in Minnesota, Florida and Arizona. 

Students should also give some thought 
to the overall structure and organization of 
graduate programmes; these can be small and 
based in single departments or wide-ranging 
and interdisciplinary. Umbrella programmes 
(sometimes called structured programmes in 
the United Kingdom) pull in faculty members 
across several departments or campuses. These 
are in contrast to more conventional, single- 
department programmes, and in many cases 
they offer numerous labs and more options 
for cross-disciplinary studies. But what they 
make up for in quantity, they may lose in the 
quality of training or mentoring. Departmen- 
tal programmes often produce more close-knit 
communities, with seminars, journal clubs 
or other events geared specifically to their 
graduate students. 

Another structure is the bridge programme, 
which offers US students the chance to apply to 
a master’s programme that filters directly into 
a PhD programme on the same or a nearby 
campus. (‘The master’s-to-PhD route is common 
in the United Kingdom.) Such programmes are 
often a sound choice for those who feel that 
they need more preparation for doctoral stud- 
ies. LaNell Williams, a second-year biophysics 
student, found that the Fisk- Vanderbilt Bridge 
Program run by Fisk and Vanderbilt universities 


What didn’t work for graduate -school applicants 


PhD students who applied to programmes 
in the past several years share the wisdom 
and insights they've acquired from their own 
experiences, and offer words of warning. 
Joseph Rodriguez, a postdoc at the 
Harvard-Smithsonian Center for Astrophysics 
in Cambridge, Massachusetts, warns 
students to be aware of the specialized, 
standardized tests required for admission 
to certain programmes — and to prepare 
for them well in advance. He signed up at 
the last minute to take the physics Graduate 
Record Examination (required for most US 
physics PhD programmes), scored poorly 
and wound up in a programme in which no 
one was working on extrasolar planets, his 
research interest. After treading water for 
many months, he transferred to Vanderbilt 
University, all of which cost valuable time. 
Priyanka Kothari, a third-year PhD student 


in biochemistry, cellular and molecular 
biology at Johns Hopkins University in 
Baltimore, Maryland, counsels applicants 
to keep a close eye on their prospective 
department’s culture and make-up when 
they are on campus. During an interview 
visit at one university, she noticed graduate 
students introducing themselves to 

one another — a clear warning that the 
department had not fostered a sense of 
community and mutual support. “! want 

to do great science, but | also want a 
relationship with other faculty members 
and other students in the department,” 
says Kothari. “That networking is so critical 
for becoming the best scientist.” She was 
dismayed during another interview when 
she noticed that the programme'’s faculty 
members were mostly white men. “Either the 
department doesn’t care enough to change 
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that or it doesn’t see it — both of which are 
red flags,” she says. 

Allatah Mekile, a second-year PhD student 
at Johns Hopkins University in Baltimore, 
Maryland, says that location played a part 
in her decision as she weighed up whether 
she would feel comfortable living near the 
university. Considerations included how far 
she’d have to travel to get groceries, whether 
she’d feel safe leaving campus late at night 
and whether she could easily bring her car 
with her. 

Rodriguez explains why students should 
think carefully when choosing programmes. 
“Graduate school is not just 5-6 years, but 
also working 50-60 hours per week and 
taking the hardest courses you’ve ever taken 
in your life,” he says. “You’re going to burn 
yourself out if you don’t really like what you 
are doing.” K.P. 


FISK UNIV. 


The Fisk-Vanderbilt Bridge Program links up students and mentors. 


in Nashville, Tennessee, let her meet up with 
other students who were from groups that 
are under-represented in science. In contrast 
to her experience as the only woman of col- 
our in her undergraduate physics studies, 
Williams says that after a year in the Fisk— 
Vanderbilt programme, she feels comfort- 
able and has formed a community with fellow 
students. “I have been able to thrive,” she says, 
“and see myself as a physicist.” 

The doctoral application process is not too 
early to think about ultimate career goals, 
says Horazdovsky. “Those can change,” he 
says. “However, you need to make sure you 
will have tools or experiences to achieve 
your goal by the end of graduate school.” For 
example, students who think they want to 
work at a mainly undergraduate institution 
will want significant teaching experience. 
Students who aim for industry will need 
exposure to business, companies and the jobs 
that PhD holders occupy. Students should 
also find out whether their programme of 
choice hosts, or at least encourages students 
to attend, conferences and workshops that 
help them to build teaching, networking and 
communications skills. 

Many programmes include career- 
development components that give students 
real-world exposure to career tracks. These 
can be extremely helpful for students who are 
not aiming for an academic research posi- 
tion and can include university internships, 
external internships and other options. 


TRUE GRIT 

Students at the application stage need to 
stand out from the crowd to get accepted by 
their school of choice. David Charbonneau, 
director of graduate admissions for Harvard 
University’s department of astronomy in 
Cambridge, Massachusetts, looks for students 
who have persevered in the face of obstacles. 
“Most of what we do in science leads to 
dead ends,” he says. He seeks students who 
are passionate and hard-working, and who 
have demonstrated new ways of tackling 


problems — for example, by working through 
solutions to an ambitious research problem 
for several years. These attributes should 
come across through concrete examples in 
their letters of recommendation, he says. 
McLean says applicants should personalize 
their application statements by including a 
paragraph explaining which faculty members 
within a programme they would like to work 
with, and why. 

If prospective PhD students are unsure 
whether graduate school is the right decision, 
they should take a year or two to work as a 
research assistant in an academic or industry 
lab before making the hefty commitment to 
doctoral studies. Taking that time is no longer 
viewed as a negative, says McLean, but instead 
shows that applicants have realistic expecta- 
tions and are aware of what's ahead. Allatah 
Mekile was uncertain of her next steps after 
finishing college at East Stroudsburg Uni- 
versity of Pennsylvania, so she moved home 
and took an entry-level position as a research 
associate at a supplier of nutritional prod- 
ucts. There, she worked for two years on a 
metabolic-engineering project before apply- 
ing to graduate programmes; she is now a 
second-year doctoral student in biochemistry, 
cellular and molecular biology at Johns 
Hopkins University in Baltimore, Maryland. 
She says that her experience in industry 
also helped her to explain in her application 
letter why and how certain programmes 
aligned with her career goals. 

By eschewing the conventional path of 
going immediately into a doctoral pro- 
gramme after earning a bachelor’s degree, 
and gambling that shed be better prepared, 
Mekile showed that she was ready for gradu- 
ate studies, says Bogle. “The whole point of 
going to graduate school is to take a bit of 
a risk. If you want to play it safe all the way 
through, then maybe graduate school — or 
research — isn’t for you.” = 


Kendall Powell is a freelance science 
journalist based in Lafayette, Colorado. 
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GENDER BALANCE 
Culture clash 


Scientific disciplines that have a 
‘masculine culture’ tend to deter women 
from pursuing those fields, a study finds 
(S. Cheryan et al. Pyschol. Bull. http:// 
dx.doi.org/10.1037/bul10000052; 2016). 
The study analysed 1,200 publications 
looking at women's participation in 
science, technology, engineering and 
mathematics to learn why women are well 
represented in biology but not in physics, 
computer science and engineering. The 
authors found that the presence of negative 
stereotypes about women’s abilities and 
the lack of female role models were major 
factors in deterring women. But they also 
found that women who feared gender 
bias and discrimination might be more 
likely to avoid certain fields. Predictors of 
decreased participation included a lack of 
pre-university experience in the field and 
a lack of confidence. The low numbers 

are also linked more to a failure to recruit 
female students into the fields than in 
retaining them, suggest the authors. 
Creating a more inclusive culture is the 
best way to boost female participation, 
the authors say. 


BIG PHARMA 
UK drugs outsourced 


Biopharmaceutical companies in the 
United Kingdom have cut research 
positions in drug discovery, according to 

a report released by the Association of the 
British Pharmaceutical Industry on 

17 October, entitled The Changing UK 
Drug Discovery Landscape. In the past 
decade, almost all large UK drugmakers 
slashed in-house research jobs in discovery, 
the earliest stage of drug development, 
when researchers usually test hundreds of 
thousands of compounds to find one that 
could move into the next stage. Overall, 
there has been a net loss of several hundred 
positions. At the same time, however, large 
companies have increased their investment 
in drug discovery through outsourcing and 
collaborations. A number of UK contract 
research organizations (CROs) reported. 
growth in partnerships with academic 
drug-discovery centres. Some CROs 
reported more drug-discovery employees, 
and about one-quarter of those reported 
staff increases of more than 25%. Yet some 
of the rise in CRO research jobs is also due 
to an increase in the number of contracts 
made with companies outside the United 
Kingdom, particularly in North America 
and the European Union, the report says. 
The UK biopharmaceutical industry 
employs more than 70,000 people. 
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BLOOD WILL TELL 


BY TOM EASTON & 
JACK MCDEVITT 


ndy Pharon didn’t know why 
At spent an hour every morn- 

ing on FaceBook. Scandal! 3 
Outrage! Funny pussycats! More 
outrage! He might have been reading 
a tabloid, except that FaceBook was 
more respectable. Which mattered, as 
he was in Larry’s. Martha came over. 
“Everything okay, Andy?” 

“Excellent.” He gave her his stand- 
ard thumbs-up. 

He was relieved moments later when his 
e-mail dinged. Sarah Mills, chief develop- 
ment officer at BioFutures Labs, wanted 
more ideas. Meeting at 10. Be there! 

He finished his sweet roll and sipped his 
coffee. More ideas. He had nothing, but he 
couldn't say that, could he? 

That was when the old guy with the roller 
bag squeezed between tables and stopped 
beside his chair. He was too well dressed to 
bea drifter but Andy still shook his head as 
he turned away for another sip of coffee. 

“T thought I remembered this place,’ the 
guy said. “Came here every morning for five 
years.” 

Andy concentrated on his coffee cup and 
said nothing. Give em an inch, and they'll 
take a mile. Ten miles. 

The guy looked down at him. “Hi, Andy. 
How’ it going?” 

“You know my name?” 

“Sure. ’'m you.” 

“What?” His face was lined and seamed, 
age spots, hardly any hair. Fifty years older 
than Andy. “Would you please go away?” 

“We'll get time travel in about 30 years.” 
He smiled. “I need a favour” 

If this had been an e-mail, he would have 
hit delete. “Go away, gramps!” 

The guy sighed. “I knew you would react 
that way. That I would. That I had. But ’'m 
not a scammer. I don't want your money. 
And [already have your ID? He pulled out 
a chair and lowered himself into it. Then he 
produced a wallet. “See?” 

Driver's licence. His picture with the name 
Andrew Pharon. Birth date was correct. Issue 
date: 2072. That would make him over 80. 

Andy stared at him. The guy was smiling. 


“What do you want?” 
> NATURE.COM The smile faded. 
Follow Futures: “Some of your blood.” 
Y @NatureFutures Andy sat frozen. 
El go.nature.com/mtoodm Had his life turned 
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into a vampire fantasy? 

“Just some plasma, actually” 

“Why?” 

“Your people are already working on it. 
Putting young plasma into an old body can 
turn the clock back” 

Andy nodded. If it was true... “But why 
me?” Even as he spoke, he knew the answer. 
His own young plasma would work better 
than anyone else's. He really was a time- 
traveller. 

Andrew grinned and delivered his stand- 
ard thumbs-up, removing all doubt. 

“Andy!” Martha waved at him. “You gonna 
be late!” 

He waved back. This was one reason he 
liked Larry's. They cared. 

The old guy was still sitting there, waiting 
for his response. But it was ridiculous. Time 
travel wasn't possible. “You have got to be 
pulling my leg.” 

The guy shook his head. “No. I just need 
a couple of pints today, and again next week 
and the week after.” He looked at his bag. 
“The equipments right here.” 

“Tm sure it is. But there’s no way I’m let- 
ting you stick needles in me. And I’ve got to 
run.’ Andy tucked his tablet into his brief- 
case and stood. 

“But...!” He looked stricken, as if he had 
never dreamt that his own self would turn 
him down. “But I’m you! We're even closer 
than blood kin!” 

“Pardon me. I have to leave.” Incredibly, 
the guy was smiling as Andy went out the 
door. 


Andy glanced over his shoulder and headed 
down the sidewalk, barely noticing the fumes 
of the remaining gasburners or the fragrance 
of the vagrant at the corner. The old guy 
wasn't following him. Thank God. Maybe 
he should switch coffee shops for a few days. 
But then the guy might just show up on his 
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doorstep. That would freak the hell 
oe out of his girlfriend. 
Okay. Now he had to come up 
os an idea for Sarah. 

BioFutures focused on the micro- 
biome. Their last big success was a 
—"" probiotic ointment for getting rid 
' ofacne. Lately theyd been working 
on figuring out how to manipulate 

bacteria in the gut to control obesity. 

They were close, which was why they 

needed new ideas. Had to keep the 

pipeline flowing. 
Maybe the old guy had something? 
Not time travel. But he recalled reading 
something about plasma and ageing. It 
wouldnt take long to check. 

Once in the building, he went directly to 
his cube and started the search. And yes, 
they were working on it, testing it on people 
and making slow progress. The idea went 
back a century, when someone spliced the 
veins of a young mouse and an old mouse 
together. The old one got perkier, healthier, 
younger. The young one aged. 

And plasma could be frozen. 

He almost laughed. 

It took him an hour to write the proposal. 
Start with some research into whether one’s 
own young plasma is really better than 
a stranger’s. Use mice, where the differ- 
ence between young and old isn't great. If 
it checks out, then start collecting plasma, 
freeze it, store it, and when the donor turns 
into an old guy... 

He thought Sarah would like it. It was the 
perfect business plan, complete with refer- 
ences and links. Sella promise, much like the 
old cryonics scam. Collect the money now, 
and worry later about whether the product 
actually works. Although this one seemed 
much more likely to be a success than cryon- 
ics ever had. 

He would be among the very first to bank 
his plasma. And his older self knew how it 
had worked out. No wonder hed sat there 
smiling when Andy walked out. = 


Tom Easton is a retired theoretical biologist 
who has written science-fiction novels and 
criticism and edited anthologies in addition 
to more academic work. Jack McDevitt 

is a prolific, award-winning novelist with 

an abiding interest in alien contact. A 
Philadelphia native, he has been, among 
other things, a naval officer, an English 
teacher and a management trainer for the 
US Customs Service. 


ILLUSTRATION BY JACEY 


